This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.

Data-driven Estimation of the Algebraic Riccati Equation for
the Discrete-Time Inverse Linear Quadratic Regulator Problem

Shuhei Sugiura1, Ryo Ariizumi2, Masaya Tanemura3, Toru Asai1, and Shun-ichi Azuma4 *This work was supported in part by the Japan Society for the Promotion of Science KAKENHI under Grant JP22K04027 and in part by JST FOREST Program JPMJFR2123.1Shuhei Sugiura and Toru Asai are with the Graduate School of Engineering, Nagoya University, Nagoya, Japan sugiura.shuhei.j6@s.mail.nagoya-u.ac.jp asai@nuem.nagoya-u.ac.jp2Ryo Ariizumi is with the Department of Mechanical System Engineering, Tokyo University of Agriculture and Technology, Tokyo, Japan ryoariizumi@go.tuat.ac.jp3Masaya Tanemura is with the Graduate School of Science and Technology, Shinshu University, Nagano, Japan tanemura@shinshu-u.ac.jp2Shun-ichi Azuma is with the Graduate School of Informatics, Kyoto University, Kyoto, Japan azuma.shunichi.3e@kyoto-u.ac.jp
Abstract

In this paper, we propose a method for estimating the algebraic Riccati equation (ARE) with respect to an unknown discrete-time system from the system state and input observation. The inverse optimal control (IOC) problem asks, “What objective function is optimized by a given control system?” The inverse linear quadratic regulator (ILQR) problem is an IOC problem that assumes a linear system and quadratic objective function. The ILQR problem can be solved by solving a linear matrix inequality that contains the ARE. However, the system model is required to obtain the ARE, and it is often unknown in fields in which the IOC problem occurs, for example, biological system analysis. Our method directly estimates the ARE from the observation data without identifying the system. This feature enables us to economize the observation data using prior information about the objective function. We provide a data condition that is sufficient for our method to estimate the ARE. We conducted a numerical experiment to demonstrate that our method can estimate the ARE with less data than system identification if the prior information is sufficient.

I INTRODUCTION

The inverse optimal control (IOC) problem is a framework used to determine the objective function optimized by a given control system. By optimizing the objective function obtained by IOC, a different system can imitate the given control system’s behavior. For example, several studies have been conducted on robots imitating human or animal movements [1, 2, 3]. To apply IOC, in these studies, the researchers assumed that human and animal movements are optimal for unknown criteria. Such an optimality assumption holds in some cases [4].

The first IOC problem, which assumes a single-input linear time-invariant system and quadratic objective function, was proposed by Kalman [5]. Anderson [6] generalized the IOC problem in [5] to the multi-input case. For this type of IOC problem, called the inverse linear quadratic regulator (ILQR) problem, Molinari [7] proposed a necessary and sufficient condition for a linear feedback input to optimize some objective functions. Moylan et al. [8] proposed and solved the IOC problem for nonlinear systems. IOC in [5, 6, 7, 8] is based on control theory, whereas Ng et al. [9] proposed inverse reinforcement learning (IRL), which is IOC based on machine learning. Recently, IRL has become an important IOC framework, along with control theory methods [10].

We consider the ILQR problem with respect to an unknown discrete-time system. Such a problem often occurs in biological system analysis, which is the main application target of IOC. The control theory approach solves the ILQR problem by solving a linear matrix inequality (LMI) that contains the algebraic Riccati equation (ARE) [11]. However, to obtain the ARE, the system model is required. The IRL approach also has difficulty solving our problem. There is IRL with unknown systems [12], and IRL with continuous states and input spaces [13]; however, to the best of our knowledge, no IRL exists with both.

In the continuous-time case, we proposed a method to estimate the ARE from the observation data of the system state and input directly [14]. In the present paper, we estimate the ARE for a discrete-time system using a method that extends the result in [14]. Similarly to [14], our method transforms the ARE by multiplying the observed state and input on both sides. However, this technique alone cannot calculate ARE without the system model because the form of the ARE differs between continuous and discrete-time. We solve this problem using inputs from the system’s controller. Also, the use of such inputs enable us to estimate the ARE without knowing the system’s control gain. We prove that the equation obtained from this transformation is equivalent to the ARE if the dataset is appropriate. The advantage of our method is the economization of the observation data using prior information about the objective function. We conducted a numerical experiment to demonstrate that our method can estimate the ARE with less data than system identification if the prior information is sufficient.

The structure of the remainder of this paper is as follows: In Section II, we formulate the problem considered in this paper. In Section III, we propose our estimation method and prove that the estimated equation is equivalent to the ARE. In Section IV, we describe numerical experiments that confirm our statement in Section III. Section V concludes the paper.

II PROBLEM FORMULATION

We consider the following discrete-time linear system:

x(k+1)=Ax(k)+Bu(k),x{\left(k+1\right)}=Ax{\left(k\right)}+Bu{\left(k\right)}, (1)

where An×nA\in{\mathbb{R}}^{n\times n} and Bn×mB\in{\mathbb{R}}^{n\times m} are constant matrices, and x(k)nx{\left(k\right)}\in{\mathbb{R}}^{n} and u(k)mu{\left(k\right)}\in{\mathbb{R}}^{m} are the state and the input at time kk\in\mathbb{Z}, respectively.

Let +={0,1,2,}\mathbb{Z}_{+}=\left\{0,1,2,\cdots\right\}. We write the set of real-valued symmetric n×nn\times n matrices as symn×n{\mathbb{R}}^{n\times n}_{\rm sym}. In the LQR problem [15], we determine the input sequence u(k)u{\left(k\right)} (k+)\left(k\in\mathbb{Z}_{+}\right) that minimizes the following objective function:

J(u)=k=0[x(k)Qx(k)+u(k)Ru(k)],J{\left(u\right)}=\sum_{k=0}^{\infty}\left[x{\left(k\right)}^{\top}Qx{\left(k\right)}+u{\left(k\right)}^{\top}Ru{\left(k\right)}\right], (2)

where Qsymn×nQ\in{\mathbb{R}}^{n\times n}_{\rm sym} and Rsymm×mR\in{\mathbb{R}}^{m\times m}_{\rm sym}.

We assume that the system (1) is controllable and matrices QQ and RR are positive definite. Then, for any x(0)x{\left(0\right)}, the LQR problem has a unique solution written as the following linear state feedback law:

u(k)=(R+BPB)1BPAx(k),u{\left(k\right)}=-\left(R+B^{\top}PB\right)^{-1}B^{\top}PAx{\left(k\right)}, (3)

where Psymn×nP\in{\mathbb{R}}^{n\times n}_{\rm sym} is a unique positive definite solution of the ARE:

APAP+QAPB(R+BPB)1BPA=0.A^{\top}PA-P+Q-A^{\top}PB\left(R+B^{\top}PB\right)^{-1}B^{\top}PA=0. (4)

For uu defined in (3), J(u)=x(0)Px(0)J{\left(u\right)}=x{\left(0\right)}^{\top}Px{\left(0\right)} holds.

In the ILQR problem, we find positive definite matrices Qsymn×nQ\in{\mathbb{R}}^{n\times n}_{\rm sym} and Rsymm×mR\in{\mathbb{R}}^{m\times m}_{\rm sym} such that an input u(k)=Kx(k)u{\left(k\right)}=-Kx{\left(k\right)} with the given gain Km×nK\in{\mathbb{R}}^{m\times n} is a solution of the LQR problem, that is, uu minimizes (2).

We can solve the ILQR problem by solving an LMI. Let the system (1) be controllable. Then, u(k)=Kx(k)u{\left(k\right)}=-Kx{\left(k\right)} minimizes (2) if and only if a positive definite matrix Psymn×nP\in{\mathbb{R}}^{n\times n}_{\rm sym} exists that satisfies (4) and the following:

K=(R+BPB)1BPA.K=\left(R+B^{\top}PB\right)^{-1}B^{\top}PA. (5)

By transforming (4) and (5), we obtain the following pair of equations equivalent to (4) and (5):

APAP+QK(R+BPB)K=0,BPA(R+BPB)K=0.\begin{split}A^{\top}PA-P+Q-K^{\top}\left(R+B^{\top}PB\right)K=&0,\\ B^{\top}PA-\left(R+B^{\top}PB\right)K=&0.\end{split} (6)

Hence, we can solve the ILQR problem by determining PP, Qsymn×nQ\in{\mathbb{R}}^{n\times n}_{\rm sym}, and Rsymm×mR\in{\mathbb{R}}^{m\times m}_{\rm sym} that satisfy the following LMI:

P>0,Q>0,R>0s.t.Equation(6).P>0,\;Q>0,\;R>0\qquad{\rm s.t.\quad Equation\;(6)}. (7)

In biological system analysis or reverse engineering, the system model and control gain is often unknown, and the ARE (6) is not readily available. Hence, we consider the following problem of determining a linear equation equivalent to (6) using the system state and input observation:

Problem 1 (ARE estimation problem)

Consider controllable system (1) and controller u(k)=Kx(k)u{\left(k\right)}=-Kx{\left(k\right)} with unknown AA, BB, and KK. Suppose NdN_{d} observation data (xi(0),ui,xi(1))\left(x_{i}{\left(0\right)},u_{i},x_{i}{\left(1\right)}\right) (i{1,,Nd})\left(i\in\left\{1,\ldots,N_{d}\right\}\right) of the system state and input are given, where

xi(1)=Axi(0)+Bui(i{1,,Nd}).x_{i}{\left(1\right)}=Ax_{i}{\left(0\right)}+Bu_{i}\quad\left(i\in\left\{1,\ldots,N_{d}\right\}\right). (8)

Let NdNdN^{\prime}_{d}\leq N_{d}. NdN^{\prime}_{d} inputs in the data are obtained from the unknown controller as follows:

ui=Kxi(0)(i{1,,Nd}).u_{i}=-Kx_{i}{\left(0\right)}\quad\left(i\in\left\{1,\ldots,N^{\prime}_{d}\right\}\right). (9)

Determine a linear equation of PP, Qsymn×nQ\in{\mathbb{R}}^{n\times n}_{\rm sym}, and Rsymm×mR\in{\mathbb{R}}^{m\times m}_{\rm sym} with the same solution space as (6).

We discuss without assuming that the data is sequential, that is, always satisfies xi+1(0)=xi(1)x_{i+1}{\left(0\right)}=x_{i}{\left(1\right)}. However, in an experiment, we show that our result can also be applied to one sequential data. The ARE has multiple solutions and we call the set of these solutions, which is a linear subspace, a solution space.

III DATA-DRIVEN ESTIMATION OF ARE

The simplest solution to Problem 1 is to identify the control system, that is, matrices AA, BB and KK. We define matrices Xn×NdX\in\mathbb{R}^{n\times N_{d}}, Xn×NdX^{\prime}\in\mathbb{R}^{n\times N^{\prime}_{d}}, Um×NdU\in\mathbb{R}^{m\times N_{d}}, Um×NdU^{\prime}\in\mathbb{R}^{m\times N^{\prime}_{d}}, and Dn+m×NdD\in\mathbb{R}^{n+m\times N_{d}} as follows:

X(k)=[x1(k)xNd(k)],X(k)=[x1(k)xNd(k)](k{0,1}),U=[u1uNd],U=[u1uNd],D=[X(0)U]\begin{split}X{\left(k\right)}=&\left[x_{1}{\left(k\right)}\;\cdots\;x_{N_{d}}{\left(k\right)}\right],\\ X^{\prime}{\left(k\right)}=&\left[x_{1}{\left(k\right)}\;\cdots\;x_{N^{\prime}_{d}}{\left(k\right)}\right]\quad\left(k\in\left\{0,1\right\}\right),\\ U=&\left[u_{1}\;\cdots\;u_{N_{d}}\right],\quad U^{\prime}=\left[u_{1}\;\cdots\;u_{N^{\prime}_{d}}\right],\\ D=&\left[X{\left(0\right)}^{\top}\;U^{\top}\right]^{\top}\end{split} (10)

Let the matrices DD and X(0)X^{\prime}{\left(0\right)} have row full rank. Then, matrices AA, BB, and KK are identified using the least square method as follows:

[AB]=X(1)D[DD]1K=UX(0)[X(0)X(0)]1\begin{split}\left[A\;B\right]=&X{\left(1\right)}D^{\top}\left[DD^{\top}\right]^{-1}\\ K=&-U^{\prime}X^{\prime}{\left(0\right)}^{\top}\left[X^{\prime}{\left(0\right)}X^{\prime}{\left(0\right)}^{\top}\right]^{-1}\\ \end{split} (11)

In ILQR problems, prior information about matrices QQ and RR may be obtained. However, such information cannot be used for system identification. We propose a novel method that uses the prior information and estimates the ARE using less observation data than system identification.

The following theorem provides an estimation of the ARE:

Theorem 1

Consider Problem 1. Let FF be an Nd×NdN^{\prime}_{d}\times N_{d} matrix whose iith row jjth column element fi,jf_{i,j} is defined as

fi,j=xi(1)Pxj(1)+xi(0)(QP)xj(0)+uiRuj.f_{i,j}=x_{i}{\left(1\right)}^{\top}Px_{j}{\left(1\right)}+x_{i}{\left(0\right)}^{\top}\left(Q-P\right)x_{j}{\left(0\right)}+u_{i}^{\top}Ru_{j}. (12)

Then, the following condition is necessary for the ARE (6) to hold.

F=0.F=0. (13)
Proof:

We define G1n×nG_{1}\in\mathbb{R}^{n\times n} and G2m×nG_{2}\in\mathbb{R}^{m\times n} as

G1=APAP+QK(R+BPB)K,G2=BPA(R+BPB)K.\begin{split}G_{1}=&A^{\top}PA-P+Q-K^{\top}\left(R+B^{\top}PB\right)K,\\ G_{2}=&B^{\top}PA-\left(R+B^{\top}PB\right)K.\end{split} (14)

The ARE (6) is equivalent to the combination of G1=0G_{1}=0 and G2=0G_{2}=0. Let i{1,,Nd}i\in\left\{1,\ldots,N^{\prime}_{d}\right\} and j{1,,Nd}j\in\left\{1,\ldots,N_{d}\right\} be arbitrary natural numbers. From the ARE, we obtain

[xi(0)ui][G1G2G20][xj(0)uj]=0\begin{split}\left[\begin{array}[]{*{20}{c}}x_{i}{\left(0\right)}\\ {u_{i}}\end{array}\right]^{\top}\left[\begin{array}[]{*{20}{c}}G_{1}&G_{2}^{\top}\\ G_{2}&0\end{array}\right]\left[\begin{array}[]{*{20}{c}}x_{j}{\left(0\right)}\\ {u_{j}}\end{array}\right]=0\end{split} (15)

Using the assumption (8) and (9), we can transform the left-hand side of (15) into the following:

[xi(0)ui][G1G2G20][xj(0)uj]=xi(0)(APAP+QK(R+BPB)K)xj(0)+xi(0)(APBK(R+BPB))uj+ui(BPA(R+BPB)K)xj(0)=(Axi(0)+Bui)P(Axj(0)+Buj)uiBPBuj(Kxi(0)+ui)(R+BPB)(Kxj(0)+uj)+ui(R+BPB)uj+xi(0)(QP)xj(0)=fi,j.\begin{split}&\left[\begin{array}[]{*{20}{c}}x_{i}{\left(0\right)}\\ {u_{i}}\end{array}\right]^{\top}\left[\begin{array}[]{*{20}{c}}G_{1}&G_{2}^{\top}\\ G_{2}&0\end{array}\right]\left[\begin{array}[]{*{20}{c}}x_{j}{\left(0\right)}\\ {u_{j}}\end{array}\right]\\ =&x_{i}{\left(0\right)}^{\top}\left(A^{\top}PA-P+Q-K^{\top}\left(R+B^{\top}PB\right)K\right)x_{j}{\left(0\right)}\\ &+x_{i}{\left(0\right)}^{\top}\left(A^{\top}PB-K^{\top}\left(R+B^{\top}PB\right)\right)u_{j}\\ &+u_{i}^{\top}\left(B^{\top}PA-\left(R+B^{\top}PB\right)K\right)x_{j}{\left(0\right)}\\ =&\left(Ax_{i}{\left(0\right)}+Bu_{i}\right)^{\top}P\left(Ax_{j}{\left(0\right)}+Bu_{j}\right)-u_{i}^{\top}B^{\top}PBu_{j}\\ &-\left(Kx_{i}{\left(0\right)}+u_{i}\right)^{\top}\left(R+B^{\top}PB\right)\left(Kx_{j}{\left(0\right)}+u_{j}\right)\\ &+u_{i}^{\top}\left(R+B^{\top}PB\right)u_{j}+x_{i}{\left(0\right)}^{\top}\left(Q-P\right)x_{j}{\left(0\right)}\\ =&f_{i,j}.\end{split} (16)

Therefore, we obtain fi,j=0f_{i,j}=0 from (15) and (16), which proves Theorem 1. ∎

Equation (13) is the estimated linear equation that we propose in this paper and it can be obtained directly from the observation data without system identification. Our method obtains one scalar linear equation of PP, QQ, and RR from a pair of observation data. However, at least one of the paired data must be a transition by a linear feedback input with the given gain KK. Equation (13) consists of NdNdN_{d}N^{\prime}_{d} scalar equations obtained in such a manner. Therefore, it is expected that if NdN_{d} and NdN^{\prime}_{d} are sufficiently large, (13) is equivalent to (6), which is also linear for the matrices PP, QQ, and RR.

Equation (12) is similar to the Bellman equation [16]. Suppose that the ARE (6) holds, that is, u(k)=Kx(k)u{\left(k\right)}=-Kx{\left(k\right)} minimizes J(u)J{\left(u\right)}. Then, the following Bellman equation holds:

x(k)Px(k)=x(k+1)Px(k+1)+x(k)Qx(k)+x(k)KRKx(k).\begin{split}x{\left(k\right)}^{\top}Px{\left(k\right)}=&x{\left(k+1\right)}^{\top}Px{\left(k+1\right)}+x{\left(k\right)}^{\top}Qx{\left(k\right)}\\ &+x{\left(k\right)}^{\top}K^{\top}RKx{\left(k\right)}.\end{split} (17)

The equation fi,j=0f_{i,j}=0 is equivalent to the Bellman equation if xi(0)=xj(0)x_{i}{\left(0\right)}=x_{j}{\left(0\right)} and ui=uj=Kxi(0)u_{i}=u_{j}=-Kx_{i}{\left(0\right)}. The novelty of Theorem 1 is that fi,j=0f_{i,j}=0 holds under the weaker condition, that is, only ui=Kxi(0)u_{i}=-Kx_{i}{\left(0\right)}. Theorem 1 is also proved from the Bellman equation (17). See the supplemental material for the details.

The following theorem demonstrates that if the data satisfies the condition required for system identification, our estimation is equivalent to the ARE:

Theorem 2

Consider Problem 1. Suppose that matrices DD and X(0)X^{\prime}{\left(0\right)} defined in (10) have row full rank. Then, the estimation (13) is equivalent to the ARE (6).

Proof:

From Theorem 1, the ARE (6) implies the estimation (13), that is, F=0F=0. As shown in the proof of Theorem 1, matrix FF is expressed as follows using the matrices defined in (10) and (14):

F=[X(0)U][G1G2G20]D,F=\left[\begin{array}[]{*{20}{c}}X^{\prime}{\left(0\right)}\\ U^{\prime}\end{array}\right]^{\top}\left[\begin{array}[]{*{20}{c}}G_{1}&G_{2}^{\top}\\ G_{2}&0\end{array}\right]D, (18)

Because the matrix DD has row full rank, F=0F=0 implies

X(0)G1+UG2=0,X(0)G2=0.\begin{split}X^{\prime}{\left(0\right)}^{\top}G_{1}+U^{\prime\top}G_{2}&=0,\\ X^{\prime}{\left(0\right)}^{\top}G_{2}^{\top}&=0.\end{split} (19)

Because X(0)X^{\prime}{\left(0\right)} has row full rank, we obtain G1=0G_{1}=0 and G2=0G_{2}=0 from (19), which proves Theorem 2. ∎

The advantage of our method is the data economization that results from having prior information about QQ and RR. Suppose that some elements of QQ and RR are known to be zero in advance, and independent ones are fewer than the scalar equations in the ARE (6). Then, there is a possibility that our method can estimate the ARE with fewer data than m+nm+n required for system identification.

For fixed NdN_{d}, our method can provide the largest number of scalar equations when Nd=min{n,Nd}N^{\prime}_{d}=\min\left\{n,N_{d}\right\}. We use the following lemma to explain the reason:

Lemma 1

Suppose that there exist k{1,,Nd}k\in\left\{1,\ldots,N^{\prime}_{d}\right\} and cic_{i}\in\mathbb{R} (i{1,,Nd}\{k})\left(i\in\left\{1,\ldots,N^{\prime}_{d}\right\}\backslash\left\{k\right\}\right) that satisfy

dk=i=1,ikNdcidi,d_{k}=\sum_{i=1,i\neq k}^{N^{\prime}_{d}}c_{i}d_{i}, (20)

where di=[xi(0)ui]d_{i}=\left[x_{i}{\left(0\right)}^{\top}\;u_{i}^{\top}\right]^{\top} for any i{1,,Nd}i\in\left\{1,\ldots,N_{d}\right\}. Then, the following holds for any j{1,,Nd}j\in\left\{1,\ldots,N_{d}\right\}:

fk,j={i=1,ikNdcifi,j(jk)i=1,ikNdj=1,jkNdcicjfi,j(j=k).f_{k,j}=\begin{cases}\displaystyle\sum_{i=1,i\neq k}^{N^{\prime}_{d}}c_{i}f_{i,j}&\left(j\neq k\right)\\ \displaystyle\sum_{i=1,i\neq k}^{N^{\prime}_{d}}\sum_{j=1,j\neq k}^{N^{\prime}_{d}}c_{i}c_{j}f_{i,j}&\left(j=k\right)\end{cases}. (21)
Proof:

We can readily prove Lemma 1 using following derived from (18)

fk,j=dk[G1G2G20]dj(k{1,,Nd},j{1,,Nd}).\begin{split}&f_{k,j}=d_{k}^{\top}\left[\begin{array}[]{*{20}{c}}G_{1}&G_{2}^{\top}\\ G_{2}&0\end{array}\right]d_{j}\\ &\left(k\in\left\{1,\ldots,N^{\prime}_{d}\right\},j\in\left\{1,\ldots,N_{d}\right\}\right).\end{split} (22)

Lemma 1 means that if data dkd_{k} is the linear combination of other data, the equations obtained from dkd_{k} are also linear combinations of other equations not using dkd_{k}. Hence, without noise, such a data as dkd_{k} is meaningless. If Nd>nN^{\prime}_{d}>n, at least one of d1,,dNdd_{1},\ldots,d_{N^{\prime}_{d}} is the linear combination of other data because the inputs in those data are obtained from the same gain KK. Therefore, NdN^{\prime}_{d} larger than nn reduces data efficiency. From the above and fi,j=fj,if_{i,j}=f_{j,i}, our method can provide n(n+1)2+n(Ndn)\frac{n\left(n+1\right)}{2}+n\left(N_{d}-n\right) equations if NdnN_{d}\geq n.

For an example of prior information, we consider diagonal QQ and RR. In this case, the number of the independent elements of PP, QQ, and RR is n(n+1)2+n+m\frac{n\left(n+1\right)}{2}+n+m. Hence, if NdN_{d} is Ndmin(n,m)=n+1+mnN_{d\rm min}{\left(n,m\right)}=n+1+\lceil\frac{m}{n}\rceil or larger, we have more equations than the independent elements. Fig. 1 compares the numbers NdminN_{d\rm min} and n+mn+m of data required for our method and system identification, respectively, if QQ and RR are diagonal. From Fig. 1, our method may estimate the ARE with less data than system identification. Additionally, this tendency becomes stronger as the number mm of inputs becomes larger than the number nn of states.

Refer to caption
Figure 1: Ratio Ndminn+m\frac{N_{d\rm min}}{n+m} when 5n2005\leq n\leq 200 and 5m2005\leq m\leq 200. The contour lines are not smooth, but this is not a mistake.

In the case of noisy data, we discuss the unbiasedness of the coefficients in (13). The coefficient of kkth row llth column element qk,lq_{k,l} of QQ in (12) is expressed as follows:

{xi,k(0)xj,l(0)+xi,l(0)xj,k(0)(ij)xi,k(0)xi,l(0)(i=j),\begin{cases}x_{i,k}{\left(0\right)}x_{j,l}{\left(0\right)}+x_{i,l}{\left(0\right)}x_{j,k}{\left(0\right)}&\left(i\neq j\right)\\ x_{i,k}{\left(0\right)}x_{i,l}{\left(0\right)}&\left(i=j\right),\end{cases} (23)

where xi,k(0)x_{i,k}{\left(0\right)} is the kkth element of xi(0)x_{i}{\left(0\right)}. Suppose that the data is a sum of the true value and zero-mean noise distributed independently for each data. Then, we can readily confirm that if iji\neq j, the coefficient (23) of qk,lq_{k,l} in (12) is unbiased, and otherwise, it is biased. This result is also the same for the coefficients of the elements of PP and RR.

IV NUMERICAL EXPERIMENT

Refer to caption
Figure 2: Singular values of coefficient matrices Θ\Theta and Θ^\hat{\Theta} in Experiment 1.
Refer to caption
Figure 3: Relationship between distance d(S,S^)d{\left(S,\hat{S}\right)} and number of significant digits in the computation in Experiment 1.
Refer to caption
Figure 4: Singular values of coefficient matrices Θ\Theta and Θ^\hat{\Theta} in Experiment 2.

IV-A Distance Between Solution Spaces

In our experiments, we evaluated the ARE estimation using a distance between the solution spaces of the ARE and estimation. Let sNvs\in\mathbb{R}^{N_{v}} be a vector generated by independent elements in the matrices PP, QQ, and RR in any fixed order, where NvN_{v} is defined according to the prior information. Because the ARE (6) is linear for ss, we can transform (6) into the following equivalent form:

Θs=0,\Theta s=0, (24)

where ΘNARE×Nv\Theta\in\mathbb{R}^{N_{\rm ARE}\times N_{v}} is the coefficient matrix and NARE=n(n+1)2+mnN_{\rm ARE}=\frac{n\left(n+1\right)}{2}+mn. Similarly, the estimation (13) is transformed into the following equivalent form:

Θ^s=0,\hat{\Theta}s=0, (25)

where Θ^Nest×Nv\hat{\Theta}\in\mathbb{R}^{N_{\rm est}\times N_{v}} is the coefficient matrix and Nest=NdNdNd(Nd1)/2N_{\rm est}=N_{d}N^{\prime}_{d}-N^{\prime}_{d}\left(N^{\prime}_{d}-1\right)/2. We define the solution spaces SS and S^\hat{S} of the ARE (24) and estimation (25) as follows:

S={sNv|Θs=0},S^={sNv|Θ^s=0}.S=\left\{s\in\mathbb{R}^{N_{v}}\left|\Theta s=0\right.\right\},\quad\hat{S}=\left\{s\in\mathbb{R}^{N_{v}}\left|\hat{\Theta}s=0\right.\right\}. (26)

We define the distance between the solution spaces using the approach provided [17]. Assume that Θ\Theta and Θ^\hat{\Theta} have the same rank. Let Π\Pi, Π^Nv×Nv\hat{\Pi}\in\mathbb{R}^{N_{v}\times N_{v}} be the orthogonal projections to SS and S^\hat{S}, respectively. The distance between SS and S^\hat{S} is defined as follows:

d(S,S^)=ΠΠ^2,d{\left(S,\hat{S}\right)}=\left\|\Pi-\hat{\Pi}\right\|_{2}, (27)

where 2\left\|\cdot\right\|_{2} is the matrix norm induced by the 2-norm of vectors. Distance d(S,S^)d{\left(S,\hat{S}\right)} is the maximum Euclidian distance between s^S^\hat{s}\in\hat{S} and Πs^S\Pi\hat{s}\in S when s^2=1\left\|\hat{s}\right\|_{2}=1, as explained in Fig. 2 and by the following equation:

maxs^S^,s^2=1Πs^s^2=maxs^S^,s^2=1(ΠΠ^)s^2=d(S,S^).\begin{split}\max_{\hat{s}\in\hat{S},\left\|\hat{s}\right\|_{2}=1}\left\|{\Pi}\hat{s}-\hat{s}\right\|_{2}=&\max_{\hat{s}\in\hat{S},\left\|\hat{s}\right\|_{2}=1}\left\|\left({\Pi}-{\hat{\Pi}}\right)\hat{s}\right\|_{2}\\ =&d{\left(S,\hat{S}\right)}.\end{split} (28)

Hence, the distance satisfies 0d(S,S^)10\leq d{\left(S,\hat{S}\right)}\leq 1.

The orthogonal projections Π\Pi and Π^\hat{\Pi} are obtained from the orthogonal bases of SS and S^\hat{S}. To obtain these orthogonal bases, we use singular value decompositions of Θ\Theta and Θ^\hat{\Theta}. In our experiments, we performed the computation using MATLAB 2023a.

IV-B Experiment 1

In this experiment, we confirmed Theorem 2 by solving Problem 1. We set the controllable pair (A,B)\left(A,B\right) as follows:

A=[0.20.40.60.40.70.31.00.80.2],B=[0.10.60.20.80.40.9].A=\left[\begin{array}[]{*{20}{c}}-0.2&-0.4&-0.6\\ 0.4&-0.7&-0.3\\ -1.0&-0.8&-0.2\end{array}\right],\;B=\left[\begin{array}[]{*{20}{c}}0.1&-0.6\\ -0.2&0.8\\ 0.4&-0.9\end{array}\right]. (29)

The given gain K2×3K\in\mathbb{R}^{2\times 3} is the solution of the LQR problem for the following QQ and RR:

Q=[0.40.20.70.21.70.70.70.71.9],R=[1.70.40.41.8].Q=\left[\begin{array}[]{*{20}{c}}0.4&-0.2&0.7\\ -0.2&1.7&-0.7\\ 0.7&-0.7&1.9\end{array}\right],\;R=\left[\begin{array}[]{*{20}{c}}1.7&0.4\\ 0.4&1.8\end{array}\right]. (30)

We considered the following Nd=n+mN_{d}=n+m observation data with Nd=nN^{\prime}_{d}=n that satisfy the condition of Theorem 2:

X(0)=[0.10.40.50.40.10.40.71.00.60.80.41.00.50.80.4],X{\left(0\right)}=\left[\begin{array}[]{*{20}{c}}0.1&0.4&0.5&-0.4&-0.1\\ 0.4&0.7&1.0&0.6&0.8\\ -0.4&-1.0&0.5&-0.8&-0.4\end{array}\right], (31)
U=[K[x1(0)x3(0)]0.00.10.90.7].U=\left[-K\left[x_{1}{\left(0\right)}\;\cdots\;x_{3}{\left(0\right)}\right]\begin{array}[]{*{20}{c}}0.0&0.1\\ -0.9&-0.7\end{array}\right]. (32)

Using this setting, each of the ARE (6) and our estimation (13) contained 1212 scalar equations for Nv=15N_{v}=15 variables, that is, independent elements in the symmetric matrices PP, QQ, and RR.

Fig. 2 shows the singular values of Θ\Theta and Θ^\hat{\Theta}, in descending order, that we computed using the default significant digits. As shown in Fig. 2, there is no zero singular value of Θ\Theta, Θ^12×15\hat{\Theta}\in\mathbb{R}^{12\times 15}. Hence, the ranks of Θ\Theta and Θ^\hat{\Theta} are 1212, and the solution spaces SS, S^15\hat{S}\subset\mathbb{R}^{15} are three-dimensional subspaces. To investigate the influence of the computational error, we performed the same experiment with various numbers of significant digits. Fig. 3 shows the relationship between the distance d(S,S^)d{\left(S,\hat{S}\right)} and the number of significant digits. As shown in Fig. 3, the logarithm of the distance decreased in proportion to the number of significant digits. Therefore, we concluded that non-zero d(S,S^)d{\left(S,\hat{S}\right)} was caused by the computational error and that our method correctly estimated the ARE.

IV-C Experiment 2

In this experiment, we confirmed that, if QQ and RR are diagonal, Theorem 1 can estimate the ARE with fewer observation data than system identification. We randomly set the matrices A100×100A\in\mathbb{R}^{100\times 100} and B100×50B\in\mathbb{R}^{100\times 50} by generating each element from the uniform distribution in [1,1]\left[-1,1\right]. After the generation, we confirmed that (A,B)\left(A,B\right) is controllable. The given K50×100K\in\mathbb{R}^{50\times 100} is the solution of the LQR problem for the diagonal matrices QQ and RR whose elements were generated from the uniform distribution in [0.01,1]\left[0.01,1\right]. We used Nd=Ndmin=102N_{d}=N_{d\rm min}=102 observation data that satisfy (9) for Nd=n=100N^{\prime}_{d}=n=100. We generated the elements of x1(0),,x102(0)100x_{1}{\left(0\right)},\ldots,x_{102}{\left(0\right)}\in\mathbb{R}^{100} and u101u_{101}, u10250u_{102}\in\mathbb{R}^{50} from the uniform distribution in [1,1]\left[-1,1\right]. The number of used data, 102102, is less than n+m=150n+m=150 required for system identification (11).

Using this setting, the ARE (6) and our estimation (13) contained NARE=n(n+1)2+mn=10050N_{\rm ARE}=\frac{n\left(n+1\right)}{2}+mn=10050 and Nest=NdNdNd(Nd1)/2=5250N_{\rm est}=N_{d}N^{\prime}_{d}-N^{\prime}_{d}\left(N^{\prime}_{d}-1\right)/2=5250 scalar equations, respectively. The variables of these scalar equations were Nv=n(n+1)2+n+m=5200N_{v}=\frac{n\left(n+1\right)}{2}+n+m=5200 independent elements in the symmetric matrix PP and diagonal matrices QQ and RR.

Because the arbitrary-precision arithmetic was too slow to perform for the scale of this experiment, we performed the computation using the default significant digits. We indexed the singular values of Θ10050×5200\Theta\in\mathbb{R}^{10050\times 5200} and Θ^5250×5200\hat{\Theta}\in\mathbb{R}^{5250\times 5200} in descending order; Fig. 4 shows the 5180–5200th singular values. From Fig. 4, we assumed that, without computational error, Θ\Theta and Θ^\hat{\Theta} each had a zero singular value. Then, the solution spaces SS, S^5200\hat{S}\subset\mathbb{R}^{5200} were one-dimensional subspaces, and the distance d(S,S^)d{\left(S,\hat{S}\right)} was 4.3×10104.3\times 10^{-10}.

We compared our method with system identification. For system identification, we generated additional data x103,,x150x_{103},\ldots,x_{150} and u103,,u150u_{103},\ldots,u_{150} in the same manner as x101x_{101} and u101u_{101}. Using the same approach as that for SS, we defined the solution space S^SINv\hat{S}_{\rm SI}\subset\mathbb{R}^{N_{v}} of the ARE (6) using AA, BB, and KK identified by (11). Then, the distance d(S,S^SI)d{\left(S,\hat{S}_{\rm SI}\right)} was 2.8×10102.8\times 10^{-10}. Therefore, our method estimated the ARE with the same order of error as system identification using fewer observation data than system identification.

IV-D Experiment 3

In this experiment, we compared our method and system identification under a practical conditions: a single noisy and sequential data and sparse but not diagonal QQ and RR. We set the matrices A40×40A\in\mathbb{R}^{40\times 40} and B40×20B\in\mathbb{R}^{40\times 20} in the same manner as Experiment 2. The control gain K20×40K\in\mathbb{R}^{20\times 40} is the solution of the LQR problem for the sparse matrices QQ and RR. We performed three operations to generate QQ and RR. First, we generated matrices MQ40×40M_{Q}\in\mathbb{R}^{40\times 40} and MR20×20M_{R}\in\mathbb{R}^{20\times 20} in the same manner as AA and set Q:=MQMQQ:=M_{Q}^{\top}M_{Q} and R:=MRMRR:=M_{R}^{\top}M_{R}. Second, we randomly set 800800 non-diagonal elements of QQ to 0 to make it sparse while maintaining symmetry. Similarly, we set 200200 elements of RR to 0. Finally, add the product of a constant and the identity matrix to QQ and RR so that the maximum eigenvalues are 1010 times as large as the minimum ones, respectively.

Refer to caption
Figure 5: Singular values of coefficient matrices Θ\Theta and Θ^\hat{\Theta} in Experiment 3.
Refer to caption
Figure 6: Relationship between the noise variance σ2\sigma^{2} and distances d(S,S^)d{\left(S,\hat{S}\right)} (red “×\times”) and d(S,S^SI)d{\left(S,\hat{S}_{\rm SI}\right)} (blue “++”) in Experiment 3.

To generate data, we used the following system:

x(k+1)=Ax(k)+Bu(k),u(k)=Kx(k)+v(k),\begin{split}x^{*}{\left(k+1\right)}=&Ax^{*}{\left(k\right)}+Bu^{*}{\left(k\right)},\\ u^{*}{\left(k\right)}=&-Kx^{*}{\left(k\right)}+v{\left(k\right)},\end{split} (33)

where x(k)nx^{*}{\left(k\right)}\in{\mathbb{R}}^{n} and u(k)mu^{*}{\left(k\right)}\in{\mathbb{R}}^{m} are the state and input at time k+k\in\mathbb{Z}_{+}, respectively. If x(k)21\left\|x^{*}{\left(k\right)}\right\|_{2}\leq 1, the vector v(k)nv{\left(k\right)}\in{\mathbb{R}}^{n} with norm 0.20.2 is the product of a scalar constant and a random vector whose elements follow a uniform distribution in [1,1]\left[-1,1\right]. Otherwise, v(k)=0v{\left(k\right)}=0. We ran the system (33) through Nd=200N_{d}=200 time steps with a random initial state x(0)x^{*}{\left(0\right)} satisfying x(0)21\left\|x^{*}{\left(0\right)}\right\|_{2}\leq 1 and obtained data as follows:

xk(0)=x(k1)+εstate(k1),uk=u(k1)+εinput(k1),xk(1)=x(k)+εstate(k),\begin{split}x_{k}{\left(0\right)}=&x^{*}{\left(k-1\right)}+\varepsilon_{\rm state}{\left(k-1\right)},\\ u_{k}=&u^{*}{\left(k-1\right)}+\varepsilon_{\rm input}{\left(k-1\right)},\\ x_{k}{\left(1\right)}=&x^{*}{\left(k\right)}+\varepsilon_{\rm state}{\left(k\right)},\end{split} (34)

where εstaten\varepsilon_{\rm state}\in{\mathbb{R}}^{n} and εinputm\varepsilon_{\rm input}\in{\mathbb{R}}^{m} are the observation noise whose elements follow a normal distribution with a mean 0 and variance σ2\sigma^{2}. We explain the value of σ2\sigma^{2} later. Throughout the simulation, v(k)=0v{\left(k\right)}=0 holds 121121 times. Hence, Nd=121N^{\prime}_{d}=121. We sorted the data (xi(0),ui,xi(1))\left(x_{i}{\left(0\right)},u_{i},x_{i}{\left(1\right)}\right) (i{1,,Nd})\left(i\in\left\{1,\ldots,N_{d}\right\}\right) so that (9) holds if noise does not exist.

Under the above condition, the ARE (6) and our estimation (13) contained NARE=n(n+1)2+mn=1620N_{\rm ARE}=\frac{n\left(n+1\right)}{2}+mn=1620 and Nest=NdNdNd(Nd1)/2=16940N_{\rm est}=N_{d}N^{\prime}_{d}-N^{\prime}_{d}\left(N^{\prime}_{d}-1\right)/2=16940 scalar equations, respectively. The variables of these scalar equations were Nv=n(n+1)+m(m+1)2500=1350N_{v}=n\left(n+1\right)+\frac{m\left(m+1\right)}{2}-500=1350 independent elements in the symmetric matrix PP and sparse matrices QQ and RR.

We indexed the singular values of Θ1620×1350\Theta\in\mathbb{R}^{1620\times 1350} and Θ^16940×1350\hat{\Theta}\in\mathbb{R}^{16940\times 1350} in descending order. Fig. 5 shows the 1330–1350th singular values when σ2=108\sigma^{2}=10^{-8}. Although there is no zero singular value due to noise, the last singular value is noticeably small as shown in Fig. 5. Hence, we can conclude that the solution spaces SS, S^1350\hat{S}\subset\mathbb{R}^{1350} were one-dimensional subspaces.

We conducted experiments with different noise variance σ2\sigma^{2} from 10610^{-6} to 101610^{-16}. Also, the noise seed differs for each experiment, but the other seeds are the same. Fig. 6 shows the relationship between the noise variance σ2\sigma^{2} and distances d(S,S^)d{\left(S,\hat{S}\right)} and d(S,S^SI)d{\left(S,\hat{S}_{\rm SI}\right)}. As shown in Fig. 6, our method outperformed system identification in almost all experiments. In the experiments of variances from 10810^{-8} to 101610^{-16}, the ratio of d(S,S^)d{\left(S,\hat{S}\right)} to d(S,S^SI)d{\left(S,\hat{S}_{\rm SI}\right)} is approximately constant, and the average ratio is 0.170.17. Because the maximum value of the distances is 11, we conclude that the estimations using both methods failed with a larger noise variance than 10810^{-8}.

V CONCLUSIONS

In this paper, we proposed a method to estimate the ARE with respect to an unknown discrete-time system from the input and state observation data. Our method transforms the ARE into a form calculated without the system model by multiplying the observation data on both sides. We proved that our estimated equation is equivalent to the ARE if the data are sufficient for system identification. The main feature of our method is the direct estimation of the ARE without identifying the system. This feature enables us to economize the observation data using prior information about the objective function. We conducted a numerical experiment that demonstrated that that our method requires less data than system identification if the prior information is sufficient.

References

  • [1] K. Mombaur, A. Truong, and J.-P. Laumond, “From human to humanoid locomotion–an inverse optimal control approach”, Autonomous Robots, vol. 28, pp. 369–383, 2010.
  • [2] W. Li, E. Todorov, and D. Liu, “Inverse Optimality Design for Biological Movement Systems”, IFAC Proceedings Volumes, vol. 44, no. 1, pp. 9662–9667, 2011.
  • [3] H. El-Hussieny, A.A. Abouelsoud, S.F.M. Assal, and S.M. Megahed, “Adaptive learning of human motor behaviors: An evolving inverse optimal control approach”, Engineering Applications of Artificial Intelligence, vol. 50, pp. 115–124, 2016
  • [4] R.M. Alexander, “The gaits of bipedal and quadrupedal animals”, International Journal of Robotics Research, 3(2), pp. 49–59, 1984.
  • [5] R.E. Kalman, “When Is a Linear Control System Optimal?”, Journal of Basic Engineering, 86(1), pp. 51–60, 1964.
  • [6] B.D.O. Anderson, The Inverse Problem of Optimal Control, Technical report: Stanford University, vol. 6560, no.3, Stanford University, 1966.
  • [7] B. Molinari, “The stable regulator problem and its inverse”, IEEE Transactions on Automatic Control, vol. 18, no. 5, pp. 454–459, 1973.
  • [8] P. Moylan and B. Anderson, “Nonlinear regulator theory and an inverse optimal control problem”, IEEE Transactions on Automatic Control, vol. 18, no. 5, pp. 460–465, 1973.
  • [9] A.Y. Ng and S. Russell, “Algorithms for Inverse Reinforcement Learning”, Proceedings of the Seventeenth International Conference on Machine Learning, pp. 663–670, 2000.
  • [10] N. Ab Azar, A. Shahmansoorian, and M. Davoudi, “From inverse optimal control to inverse reinforcement learning: A historical review”, Annual Reviews in Control, vol. 50, pp. 119–138, 2020.
  • [11] M.C. Priess, R. Conway, J. Choi, J.M. Popovich, and C. Radcliffe, “Solutions to the Inverse LQR Problem With Application to Biological Systems Analysis”, IEEE Transaction on Control Systems Technology, vol. 23, no. 2, pp. 770–777, 2015.
  • [12] M. Herman, T. Gindele, J. Wagner, F. Schmitt, and W. Burgard, “Inverse reinforcement learning with simultaneous estimation of rewards and dynamics”. Artificial intelligence and statistics, PMLR, pp. 102–110, 2016.
  • [13] N. Aghasadeghi and T. Bretl, “Maximum entropy inverse reinforcement learning in continuous state spaces with path integrals”. IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1561–1566, 2011.
  • [14] S. Sugiura, R. Ariizumi, M. Tanemura, T. Asai, and S. Azuma, Data-driven Estimation of Algebraic Riccati Equation for Inverse Linear Quadratic Regulator Problem, SICE Annual Conference, 2023.
  • [15] P.J. Antsaklis and A.N. Michel, Linear Systems, Birkhäuser Boston, 2005.
  • [16] M. bardi, Optimal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations, Birkhäuser, 1997.
  • [17] G.H. Golub, and C.F. Van Loan, Matrix Computations, JHU Press, 1997.

SUPPLEMENTARY MATERIAL

We give another proof of Theorem 1 using the following Bellman equation:

x(k)Px(k)=x(k+1)Px(k+1)+x(k)Qx(k)+x(k)KRKx(k).\begin{split}x{\left(k\right)}^{\top}Px{\left(k\right)}=&x{\left(k+1\right)}^{\top}Px{\left(k+1\right)}+x{\left(k\right)}^{\top}Qx{\left(k\right)}\\ &+x{\left(k\right)}^{\top}K^{\top}RKx{\left(k\right)}.\end{split} (1)

To this end, we prove the following theorem equivalent to Theorem 1.

Theorem 1’: Consider Problem 1. Let ii, j{1,,Nd}j\in\left\{1,\ldots,N_{d}\right\} be arbitrary natural numbers. Suppose that the following ARE holds:

APAP+QK(R+BPB)K=0,BPA(R+BPB)K=0.\begin{split}A^{\top}PA-P+Q-K^{\top}\left(R+B^{\top}PB\right)K=&0,\\ B^{\top}PA-\left(R+B^{\top}PB\right)K=&0.\end{split} (2)

Then, we have fi,j=0f_{i,j}=0 if ui=Kxi(0)u_{i}=-Kx_{i}{\left(0\right)} holds, where fi,jf_{i,j} is defined as

fi,j=xi(1)Pxj(1)+xi(0)(QP)xj(0)+uiRuj.f_{i,j}=x_{i}{\left(1\right)}^{\top}Px_{j}{\left(1\right)}+x_{i}{\left(0\right)}^{\top}\left(Q-P\right)x_{j}{\left(0\right)}+u_{i}^{\top}Ru_{j}. (3)
Proof:

From the Bellman equation (1), we can readily obtain fi,j=0f_{i,j}=0 if i=ji=j. Next, suppose iji\neq j and that uj=Kxj(0)u_{j}=-Kx_{j}{\left(0\right)} also holds. Then, we have [xi(1)+xj(1)]=A[xi(0)+xj(0)]+B(ui+uj)\left[x_{i}{\left(1\right)}+x_{j}{\left(1\right)}\right]=A\left[x_{i}{\left(0\right)}+x_{j}{\left(0\right)}\right]+B\left(u_{i}+u_{j}\right) and ui+uj=K[xi(0)+xj(0)]u_{i}+u_{j}=-K\left[x_{i}{\left(0\right)}+x_{j}{\left(0\right)}\right]. Hence, from the Bellman equation (1), we obtain

0=[xi(1)+xj(1)]P[xi(1)+xj(1)]+[xi(0)+xj(0)](QP)[xi(0)+xj(0)]+(ui+uj)R(ui+uj)=fi,i+fi,j+fj,i+fj,j.\begin{split}0=&\left[x_{i}{\left(1\right)}+x_{j}{\left(1\right)}\right]^{\top}P\left[x_{i}{\left(1\right)}+x_{j}{\left(1\right)}\right]\\ &+\left[x_{i}{\left(0\right)}+x_{j}{\left(0\right)}\right]^{\top}\left(Q-P\right)\left[x_{i}{\left(0\right)}+x_{j}{\left(0\right)}\right]\\ &+\left(u_{i}+u_{j}\right)^{\top}R\left(u_{i}+u_{j}\right)\\ &=f_{i,i}+f_{i,j}+f_{j,i}+f_{j,j}.\end{split} (4)

Because fi,i=fj,j=0f_{i,i}=f_{j,j}=0 and fi,j=fj,if_{i,j}=f_{j,i}, we obtain

[ui=Kxi(0),uj=Kxj(0)]fi,j=0.\left[u_{i}=-Kx_{i}{\left(0\right)},u_{j}=-Kx_{j}{\left(0\right)}\right]\Rightarrow f_{i,j}=0. (5)

Finally, we consider the case without the assumption uj=Kxj(0)u_{j}=-Kx_{j}{\left(0\right)}. Let Δuj=uj+Kxj(0)\Delta u_{j}=u_{j}+Kx_{j}{\left(0\right)}. Substituting xj(1)=Axj(0)+Bujx_{j}{\left(1\right)}=Ax_{j}{\left(0\right)}+Bu_{j} and uj=ΔujKxj(0)u_{j}=\Delta u_{j}-Kx_{j}{\left(0\right)} to (3), we have

fi,j=xi(1)P(ABK)xj(0)+xi(1)PBΔuj+xi(0)(QP)xj(0)uiRKxj(0)+uiRΔuj.\begin{split}f_{i,j}=&x_{i}{\left(1\right)}^{\top}P\left(A-BK\right)x_{j}{\left(0\right)}+x_{i}{\left(1\right)}^{\top}PB\Delta u_{j}\\ &+x_{i}{\left(0\right)}^{\top}\left(Q-P\right)x_{j}{\left(0\right)}-u_{i}^{\top}RKx_{j}{\left(0\right)}+u_{i}^{\top}R\Delta u_{j}.\end{split} (6)

From (5), the sum of the first, third, and fourth terms on the right-hand side of (6) is zero. Hence, substituting xi(1)=Axi(0)+Buix_{i}{\left(1\right)}=Ax_{i}{\left(0\right)}+Bu_{i} and ui=Kxi(0)u_{i}=-Kx_{i}{\left(0\right)} to (6), we have

fi,j=xi(0)[K(BPB+R)+APB]Δuj.f_{i,j}=x_{i}{\left(0\right)}^{\top}\left[-K^{\top}\left(B^{\top}PB+R\right)+A^{\top}PB\right]\Delta u_{j}. (7)

From K=(R+BPB)1BPAK=\left(R+B^{\top}PB\right)^{-1}B^{\top}PA, we obtain fi,j=0f_{i,j}=0, which proves Theorem 1’. ∎