This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

cuhk1]Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, HKSAR, China cuhk2]Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, HKSAR, China

On Estimating the Probabilistic Region of Attraction for Partially Unknown Nonlinear Systems: An Sum-of-Squares Approach

Hejun Huang\arefcuhk1    Dongkun Han\arefcuhk2 [ hjhuang@mae.cuhk.edu.hk [ dkhan@mae.cuhk.edu.hk
Abstract

Estimating the region of attraction for partially unknown nonlinear systems is a challenging issue. In this paper, we propose a tractable method to generate an estimated region of attraction with probability bounds, by searching an optimal polynomial barrier function. Techniques of Chebyshev interpolants, Gaussian processes and sum-of-squares programmings are used in this paper. To approximate the unknown non-polynomial dynamics, a polynomial mean function of Gaussian processes model is computed to represent the exact dynamics based on the Chebyshev interpolants. Furthermore, probabilistic conditions are given such that all the estimates are located in certain probability bounds. Numerical examples are provided to demonstrate the effectiveness of the proposed method.

keywords:
Region of attraction, Sum-of-squares programming, Chebyshev interpolants, Gaussian processes.

1 Introduction

Tracking the performance of uncertain nonlinear systems is an essential problem of significant research interests. In engineering practices, the concepts of safety and stability are central to these uncertain nonlinear systems in most scenarios, e.g., flight dynamics, bipedal robotics and power systems [1, 2, 3, 4]. Consequently, scientists across multiple disciplines have summarized this type of problem into the analysis of region of attraction (ROA, also called domain of attraction). Estimating the ROA can directly obtain the safety or the stability margin in many practical implementations [5].

Fruitful results have been obtained to estimate the ROA of fixed nonlinear systems. Exploiting the sublevel set of Lyapunov functions is one of the useful methods [6]. Barrier functions are another powerful tool to guarantee safety such that states avoid entering a specified unsafe region [7], [8]. To meet the safety and the stability conditions simultaneously, the quadratic programs have been investigated to find qualified Lyapunov barrier functions [9]. Meanwhile, sum-of-squares programmings (SOSPs) are proposed to find more permissive results in polynomial systems [10, 11, 12, 13]. However, in many engineering practices, there exist model inaccuracies and unknown disturbances that might influence the system dynamics or even result in operation failures in the worst case.

To handle with the above issue, effective methods have been proposed regarding partially unknown systems. First, Lyapunov-based methods are developed and extended to uncertain systems, where a Lyapunov certified ROA (LCROA) estimation is conducted for uncertain, polynomial systems [14], [15]. Furthermore, learning-based methods have been studied in the literature. Among these, the use of Gaussian processes (GP) is shown to be a promising approach to quantify the uncertainty in the stochastic process [16, 17]. The idea of unifying SOSP and GP naturally arises to compute the LCROA [18, 19, 20].

Inspired by the work in [21] that firstly uses GP to compute the barrier certified ROA (BCROA), based on our previous work in [13, 22], this paper combines SOSPs and GP to estimate the optimal BCROA of partially unknown nonlinear systems. Different from the learned polynomial systems in [20], we use Chebyshev interpolants and polynomial mean functions of GP models to find the optimal BCROA rather than LCROA with relaxed assumptions. The polynomial mean function is proven to be more flexible to match other nonlinear kernels instead of polynomial kernel. To the best of our knowledge, this paper is the first to compute barrier functions via SOSPs in partially unknown nonlinear systems.

The main contributions of this paper are threefold. First, a learned polynomial system is built with probability bounds. Second, a positive but safe sample policy is developed to prepare appropriate prior information for higher accuracy of the GP model. Third, a tractable method based on SOSP is proposed to compute the probabilistic optimal BCROA.

2 Preliminary

Consider an autonomous system as follows,

x˙=f(x)+g(x)Known+d(x)Unknown,\dot{x}=\underbrace{f(x)+g(x)}_{Known}+\underbrace{d(x)}_{\textit{Unknown}}, (1)

where x𝒳nx\in\mathcal{X}\subseteq\mathbb{R}^{n} denotes the state, f,g:nnf,g:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n} denote the polynomial and the non-polynomial term respectively, and d:nnd:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n} denotes the unknown term. All the terms in (1) are Lipschitz continuous.

Without losing generality, systems with a single equilibrium are considered, and this equilibrium could be transformed to the origin via variables substitution [23].

Assumption 1.

The origin (x=0x=0) is a single stable equilibrium of (1), that is f(0)=g(0)=d(0)=0f(0)=g(0)=d(0)=0. \hfill\square

To model the system dynamics in a Bayesian framework as developed in [24], we define a prior distribution of noise [ϵ1,ϵ2,,ϵk]T[\epsilon_{1},\epsilon_{2},\dots,\epsilon_{k}]^{\mathrm{T}} over kk measurements [x1˙,x2˙,,xk˙]T[\dot{x_{1}},\dot{x_{2}},\dots,\dot{x_{k}}]^{\mathrm{T}}.

Assumption 2.

The noise [ϵ1,ϵ2,,ϵk]T[\epsilon_{1},\epsilon_{2},\dots,\epsilon_{k}]^{\mathrm{T}} over the system measurements [x˙1,x˙2,,x˙k]T[\dot{x}_{1},\dot{x}_{2},\dots,\dot{x}_{k}]^{\mathrm{T}} of (1) is uniformly bound by σn\sigma_{n}, i.e., {ϵi𝒩(0,σn2)}i=1k\{\epsilon_{i}\sim\mathcal{N}(0,\sigma_{n}^{2})\}_{i=1}^{k}. \hfill\square

The known dynamics f(x)+g(x)f(x)+g(x) can be computed directly. Thus, d(x)d(x) can be obtained by subtracting f(x)+g(x)f(x)+g(x) from x˙\dot{x}. In this work, we restrict our attention to the system (1) with bounded d(x)d(x):

Assumption 3.

The unknown term d(x)d(x) in (1) exists a bounded norm in the reproducing kernel Hilbert space (RKHS), i.e., d(x)cg\|d(x)\|\leq c_{g}, where cgc_{g} is a constant. \hfill\square

As [21] introduced, the RKHS is a Hilbert space of square integrable functions that contains functions of the form l(x)=iαik(x,xi)l(x)=\sum_{i}\alpha_{i}k(x,x_{i}), where αi\alpha_{i}\in\mathbb{R} denotes coefficient and k:𝒳×𝒳0+k:\mathcal{X}\times\mathcal{X}\rightarrow\mathbb{R}^{+}_{0} denotes a symmetric positive definite kernel function of states x,xix,x_{i}. For more details about the RKHS norm, we kindly refer interested readers to [25].

2.1 Barrier Functions

To introduce the barrier function, let us first consider a simple autonomous system as follows,

x˙=f(x),\displaystyle\dot{x}=f(x), (2)

where f(x)f(x) is locally Lipschitz continuous with a single stable equilibrium point at the origin. As we mentioned, the safety and stability of (2) can be guaranteed by ROA.

State trajectories starting inside the barrier function certified ROA (BCROA) ={x𝒳|h(x)0}\mathcal{L}=\{x\in\mathcal{X}|h(x)\geq 0\} will never enter into the unsafe region U=𝒳\\mathcal{L}_{U}=\mathcal{X}\backslash\mathcal{L} as defined,

x\displaystyle\forall x\in\mathcal{L}\;\;\;\; h(x)0,\displaystyle h(x)\geq 0, (3)
x\displaystyle\forall x\in\mathcal{L} hxf(x)0,\displaystyle\frac{\partial h}{\partial x}f(x)\geq 0,
xU\displaystyle\forall x\in\mathcal{L}_{U} h(x)<0.\displaystyle h(x)<0.

Meanwhile, the Lyapunov function V(x)V(x) certified ROA (LCROA) V={x𝒳|V(x)c}\mathcal{L}_{V}=\{x\in\mathcal{X}\,|V(x)\leq c\} denotes a sublevel set of V(x)V(x). The state trajectories inside V\mathcal{L}_{V} will always converge to the origin. The following lemma demonstrates a relationship between \mathcal{L} and V\mathcal{L}_{V}.

Lemma 1.

(Lemma 3.2 of [13]) Given an autonomous dynamical system (2) that is asymptotically stable at the origin, the estimate of ROA by barrier function is no smaller than the estimate by Lyapunov function, i.e., v\mathcal{L}_{v}\subseteq\mathcal{L}. \hfill\square

The details about how to obtain a Lyapunov maximum sublevel cc^{*} and an optimal barrier function h(x)h^{*}(x) are introduced [13]. In Example 1, we will illustrate the comparison of LCROA and BCROA.

Example 11:  Consider an autonomous system

[x˙1x˙2]=[x13x1x22x2x12x2].\displaystyle\begin{bmatrix}\dot{x}_{1}\\ \dot{x}_{2}\end{bmatrix}=\begin{bmatrix}-x_{1}^{3}-x_{1}x_{2}^{2}\\ -x_{2}-x_{1}^{2}x_{2}\end{bmatrix}. (4)

Let 𝒜1\mathcal{A}_{1} denote the LCROA and 𝒜2\mathcal{A}_{2} denote the BCROA. In (4), 𝒜1={x|V1(x)1.063}\mathcal{A}_{1}=\{x\,|\,V_{1}(x)\leq 1.063\} is established by a 4th4^{th} degree polynomial V1(x)=x14+x24+x12x22+x12+x22+x1x2V_{1}(x)=x_{1}^{4}+x_{2}^{4}+x_{1}^{2}x_{2}^{2}+x_{1}^{2}+x_{2}^{2}+x_{1}x_{2}, and 𝒜2={x|h1(x)0}\mathcal{A}_{2}=\{x\,|\,h_{1}^{*}(x)\geq 0\} is obtained by another 4th4^{th} degree polynomial barrier function h1(x)h_{1}^{*}(x) based on 𝒜1\mathcal{A}_{1}.

Refer to caption
Figure 1: Comparison of 𝒜1\mathcal{A}_{1} and 𝒜2\mathcal{A}_{2}. The red region enclosed by a dashed blue ellipse depicts 𝒜1\mathcal{A}_{1}, while the light blue region enclosed by the solid black line depicts 𝒜2\mathcal{A}_{2}.

In Figure 1, 𝒜2\mathcal{A}_{2} is significantly larger than 𝒜1\mathcal{A}_{1}, which means the former allows more states to be considered in (4). In this paper, we will leverage the superiority of the barrier function to obtain a permissively safe and stable guarantee.

2.2 Gaussian Processes

Gaussian processes (GP) provide a non-parametric regression method to capture the unknown dynamics in Bayesian framework. In the GP model, every variable is associated with a normal distribution, where the prior and the posterior GP model of these variables obey a joint Gaussian distribution [24]. A GP model is characterized by the mean function m(x)m(x) and the covariance function k(x,x)k(x,x^{\prime}), where k(x,x)k(x,x^{\prime}) computes the similarity between any two states x,x𝒳x,\,x^{\prime}\in\mathcal{X}. In most cases, the covariance function k(x,x)k(x,x^{\prime}) is also called a kernel. The measurements of d(x)d(x) in (1) can be used to construct a GP model with polynomial mean functions as follows.

Proposition 1.

Suppose there exist kk measurements of d(x)d(x) in (1) that satisfy Assumption 2, 3. Then, the following GP model of d(x)d(x) can be established with polynomial mean function m(x)m(x_{*}) and covariance function k(x,x)k(x,x_{*}) as,

m(x)\displaystyle m(x_{*}) =φ(x)Tw,\displaystyle=\varphi(x_{*})^{\mathrm{T}}w, (5)
k(x,x)\displaystyle k(x,x_{*}) =k(x,x)kT(K+σn2I)1k.\displaystyle=k(x_{*},x_{*})-k_{*}^{\mathrm{T}}(K+\sigma_{n}^{2}I)^{-1}k_{*}.

where xx_{*} is a query state, φ(x)\varphi(x_{*}) is a monomial vector, ww is a coefficient vector, [K](i,j)=k(xi,xj)[K]_{(i,j)}=k(x_{i},x_{j}) is a kernel Gramian matrix and k=[k(x1,x),k(x2,x),,k(xw,x)]Tk_{*}=[k(x_{1},x_{*}),k(x_{2},x_{*}),\dots,k(x_{w},x_{*})]^{\mathrm{T}}.

Proof.

See Appendix. ∎

Remark 1.

Proposition 1 supports us to define the degree of polynomial m(x)m(x_{*}) in (5). Besides, it establishes a feasible link between polynomial mean functions and other flexible kernels, including polynomial kernel [20]. \hfill\square

3 Learning the Partially Unknown System

To illustrate our work about learning the partially unknown dynamics (1) in polynomial form, we propose the approximated probabilistic model in (12), which combines Chebyshev interpolants and a GP model as shown in the first two subsections. We also introduce a covariance oriented safe sample policy based on GP to learn the dynamics consistently in the third subsection.

3.1 Chebyshev Interpolants

Chebyshev interpolants provide a useful way to approximate a class of nonlinear functions with a bounded remainder [26]. The term g(x)g(x) in (1) can be approximated by the Chebyshev interpolants PkP_{k} of degree kk in [1,1][-1,1] as,

g(x)Pk(x)\displaystyle g(x)\approx P_{k}(x) =i=0kαiτi(x)=A,T,i[0,k],\displaystyle=\sum_{i=0}^{k}\alpha_{i}\tau_{i}(x)=\left\langle A,T\right\rangle,\quad i\in[0,k], (6)

where ii denotes the subscript of the coefficients vector A=[α0,α1,,αk]A=[\alpha_{0},\alpha_{1},\dots,\alpha_{k}] and the Chebyshev polynomials vector T=[τ0,τ1,,τk]T=[\tau_{0},\tau_{1},\dots,\tau_{k}] that satisfies,

αi(x)\displaystyle\alpha_{i}(x) ={1π11g(x)τi(x)1x2𝑑x,i=0,2π11g(x)τi(x)1x2𝑑x,i[1,k],\displaystyle=\left\{\begin{aligned} &\frac{1}{\pi}\int_{-1}^{1}\frac{{g(x)\tau_{i}(x)}}{{\sqrt{1-x^{2}}}}dx,\quad i=0,\\ &\frac{2}{\pi}\int_{-1}^{1}\frac{{g(x)\tau_{i}(x)}}{{\sqrt{1-x^{2}}}}dx,\quad i\in[1,k],\\ \end{aligned}\right. (7)
τi(x)\displaystyle\tau_{i}(x) =cos(iarccos(x)),i[0,k].\displaystyle=\cos(i\operatorname{arccos}(x)),\qquad\quad i\in[0,k].

Note that, Chebyshev interpolants are applicable to any arbitrary interval I=[a,b]I=[a,b] by the following transformation,

I(x)=2x(b+a)ba.I(x)=\frac{2x-(b+a)}{b-a}. (8)

We define the remainder ξ=g(x)Pk(x)\xi=g(x)-P_{k}(x) based on (6). The following inequality from [26] declares that the remainder ξ\xi of Chebyshev interpolants is bounded over the domain.

Lemma 2.

(Theorem 8.2 of [26]) Let an analytic function g(x)g(x) in [1,1][-1,1] be analytically containable to the open Bernstein ellipse EE, where it satisfies |g(x)|m|g(x)|\leq m for some mm. Then, its Chebyshev interpolants Pk(x)P_{k}(x) satisfies      g(x)Pk(x)4mρkρ1,k0.\begin{aligned} \|g(x)-P_{k}(x)\|\leq\frac{4m\rho^{-k}}{\rho-1},\quad k\geq 0.\end{aligned} \square

The Bernstein ellipse EE has foci ±1\pm 1 and major radius 1+ρ1+\rho for all ρ0\rho\geq 0. For more details about the Bernstein ellipses, we kindly refer to the book by Trefethe [26].

Based on Lemma 2, the system (1) can be expressed as,

x˙\displaystyle\dot{x} =f(x)+Pk(x)+dξ(x),\displaystyle=\;f(x)+P_{k}(x)+d_{\xi}(x), (9)

where the unknown term dξ(x)d_{\xi}(x) satisfies Assumption 2, 3,

dξ(x)=d(x)+ξ.\displaystyle d_{\xi}(x)=d(x)+\xi. (10)

The approximation accuracy of Chebyshev interpolants is linearly dependent on the degree. We will illustrate a comparison in the following example.

Example 22:  Consider a nonlinear function y=|x3|y=\sqrt{|x-3|} approximated by the Chebyshev interpolants of degree 44 and 8080 in [0,6][0,6], the higher degree approximation is more accurate with less bounded remainder ξ\xi in Figure 2.

Refer to caption

(a)

Refer to caption

(b)

Figure 2: The red dot line and the blue solid line depict the results of the Chebyshev interpolants of degree 44 and 8080, respectively. (a)(a) The black dashed line depicts the real value of yy. (b)(b) The black dashed lines denote the boundary of different Chebyshev interpolants remainder ξ\xi.

3.2 Probability Bounds on the GP model

The following result provides the probability bounds in the distribution of a GP model.

Lemma 3.

(Lemma 2 of [17]) Suppose there exist kk measurements {(x1,y1),(x2,y2),,(xk,yk)}\{(x_{1},y_{1}),(x_{2},y_{2}),\dots,(x_{k},y_{k})\} of a bounded function h(x)h(x)\in\mathcal{H}, where xx denotes the state, yy denotes the measurement of h(x)h(x) that corrupted with noise {ϵ1,ϵ2,ϵk}𝒩(0,σn2)\{\epsilon_{1},\epsilon_{2},\cdots\epsilon_{k}\}\sim\mathcal{N}(0,\sigma_{n}^{2}), and \mathcal{H} denotes the RKHS. Let δ(0,1)\delta\in(0,1). For h(x)h(x) and its inferred GP model (mh(x),σh2(x))(m_{h}(x),\sigma^{2}_{h}(x)) holds,

{h(x)mh(x)βσh(x),x𝒳}(1δ)k,\mathbb{P}\{\|h(x)-m_{h}(x)\|\leq\|\beta\|\|\sigma_{h}(x)\|,\forall x\in\mathcal{X}\}\geq(1-\delta)^{k}, (11)

where [β]k=2hk2+300γklog3(k/δ)[\beta]_{k}=\sqrt{2\lVert h_{k}\rVert^{2}+300\gamma_{k}\log^{3}(k/\delta)} is a discounting factor, \|\cdot\| is a norm of \mathcal{H}, and γk\gamma_{k} is a factor denoting the maximum mutual information over the measurements.\hfill\square

Remark 2.

The value of γk\gamma_{k} is related to the type of kernels, e.g., RBF kernel, Matérn kernel and linear kernel from [27]. Thus, its sublinear dependent term βk\beta_{k} can be regarded as a constant [18]. In [17], a more correlated statement yields that with more appropriate prior information, the value of σh(x)\sigma_{h}(x) will decrease such that the dynamics h(x)h(x) can be represented by mh(x)m_{h}(x) with fewer differences.\hfill\square

The unknown term dξ(x)d_{\xi}(x) in (10) can be learned by using a GP model. Combining previous results of the system (9), a probabilistic statement toward the exact dynamics,

x˙=f(x)+Pk(x)+mdξ(x),\displaystyle\dot{x}=f(x)+P_{k}(x)+m_{d_{\xi}}(x), (12)

holds with probability greater or equal to (1δ)k,δ(0,1)(1-\delta)^{k},\delta\in(0,1). Therefore, we can obtain a high probability statement (12) of (1) based on Lemma 3, which will be needed in the analysis of the proposed ROA estimate.

3.3 Covariance Oriented Safe Sample Policy

Computing a high-confidence GP model is closely related to the appropriate prior information. If the mean function m(x)m(x) of a GP model is closer to the prior information, k(x,x)k(x,x_{*}) will decrease to get close to the exact dynamics [24]. Therefore, handling the high covariance data will accelerate the learning process. Meanwhile, to maintain the prior information is always safe, we propose a positive sample policy inside an existing ROA as follows,

x=argmaxx,ψ(t;dξ(x))kk=0Kk(x,ψ(t;dξ(x))),\displaystyle x^{*}=\mathop{\arg\max}\limits_{x,\psi(t;d_{\xi}(x))\in\mathcal{L}_{k}}\quad\sum\limits_{k=0}^{K}{{k}}(x,\psi(t;d_{\xi}(x))), (13)

where k(x,ψ(t;dξ(x)))\sum k(x,\psi(t;d_{\xi}(x))) denotes the covariance of a trajectory ψ(t;dξ(x))\psi(t;d_{\xi}(x)) starting from xkx\in\mathcal{L}_{k}, k\mathcal{L}_{k} denotes the estimate ROA and kk denotes the index number of the total episode KK. The trajectory ψ(t;dξ(x))\psi(t;d_{\xi}(x^{*})) of the result in (13) can be used to enrich the GP prior model recursively. The example below shows a comparison of the dynamics between the original system (1) and a learned system (12).

Example 33:  Consider a partially unknown nonlinear system (14) with unknown term d(x)d(x) as

[x˙1x˙2]=[x1+x2x12x2+1|exp(x1)cos(x1)|]+[0d(x)].\displaystyle\begin{aligned} \begin{bmatrix}\dot{x}_{1}\\ \dot{x}_{2}\end{bmatrix}=\begin{bmatrix}-x_{1}+x_{2}\\ x_{1}^{2}x_{2}+1-\sqrt{|\exp{(x_{1})}\cos{(x_{1})}|}\end{bmatrix}+\begin{bmatrix}0\\ d(x)\end{bmatrix}.\end{aligned} (14)
Refer to caption

(a)

Refer to caption

(b)

Figure 3: (a)(a) The left schematic demonstrates the exact ROA of (14) with d(x)=0d(x)=0. (b)(b) The right schematic demonstrates the exact ROA of a learned polynomial system. The blue solid line depicts the ROA boundary, and the red filled circle depicts the equilibrium point.
Refer to caption
Figure 4: The result of k(x,ψ(t;dξ(x)))\sum{{k}}(x,\psi(t;d_{\xi}(x))) in Figure 3(b). The red star point poses the most uncertain point with maximum value of k\sum{{k}}, the blue solid points in the space denote the value of k\sum{{k}} starting from the red solid points in x1,x2x_{1},x_{2} plane and the solid blue eclipse denotes an estimated ROA.

Figure 3(b) shows the learned polynomial system dynamics of (14). The non-polynomial term in (14) is approximated by 4th4^{th} degree Chebyshev interpolants in [2,2]2[-2,2]^{2}, while the state trajectory ψ(t;dξ(0.05,0.05))\psi(t;d_{\xi}(-0.05,-0.05)) is used to construct a GP model with 3rd3^{rd} degree polynomial mean function. Figure 4 shows the obtained sum of covariance of the learned polynomial system based on (13). Compared to the points around the equilibrium point, those nearby the ROA edge have larger sum of covariance and can be considered with high uncertainty. Thus, the distribution of the sum of covariance in Figure 4 poses a direction to reduce the uncertainty during the learning process.

4 Estimating the Region of Attraction

In this section, we will estimate the ROA of the learned polynomial system (12) via SOSPs . The relationship of estimates between the learned polynomial system and the true dynamics will also be discussed.

4.1 Parameterize the ROA by Barrier Function

The parameterization of a barrier function in (12) via SOSPs is developed in [13]. To deal with the non-negativity constraints, we will introduce the Positivestellensatz (P-satz) here for further SOSPs implementation.

Let 𝒫\mathcal{P} be the set of polynomials and 𝒫SOS\mathcal{P}^{\text{SOS}} be the set of sum of squares polynomials, e.g., P(x)=i=1kpi2(x),P(x)=\sum_{i=1}^{k}p_{i}^{2}(x), where pi(x)𝒫p_{i}(x)\in\mathcal{P} and P(x)𝒫SOSP(x)\in\mathcal{P}^{SOS}.

Lemma 4.

([28]) For polynomials {ai}i=1m\{a_{i}\}_{i=1}^{m}, {bj}j=1n\{b_{j}\}_{j=1}^{n} and pp, define a set ={x𝒳:{ai(x)}i=1m=0,{bj(x)}j=1n0}\mathcal{B}=\{x\in\mathcal{X}:\{a_{i}(x)\}_{i=1}^{m}=0,\{b_{j}(x)\}_{j=1}^{n}\geq 0\}. Let \mathcal{B} be compact, then x𝒳,p(x)0\forall x\in\mathcal{X},p(x)\geq 0 holds if,

{r1,,rm𝒫,s1,,sn𝒫SOS,pi=1mriaij=1nsjbj𝒫SOS.\left\{\begin{array}[]{l}\exists r_{1},\dots,r_{m}\in\mathcal{P},\leavevmode\nobreak\ s_{1},\dots,s_{n}\in\mathcal{P}^{\text{SOS}},\\ p-\sum^{m}_{i=1}r_{i}a_{i}-\sum^{n}_{j=1}s_{j}b_{j}\in\mathcal{P}^{\text{SOS}}.\end{array}\right. \square

This lemma declared that any strictly positive polynomial pp is in the cone that generated by polynomials {ai}i=1m\{a_{i}\}_{i=1}^{m} and {bj}j=1n\{b_{j}\}_{j=1}^{n}. Using Lemma 4, the optimal barrier function h(x)h^{*}(x) of the learned polynomial system (12) can be searched in the state space via 3 SOSPs.

1. Obtain the maximum sublevel set {x|V(x)c}\{x|V(x)\leq c^{*}\} of a specified Lyapunov function V(x)V(x) by a bilinear search,

c=\displaystyle c^{*}= maxc+,Lc(x)𝒫SOS\displaystyle\underset{c\in\mathbb{R}^{+},\,L_{c}(x)\in\mathcal{P}^{\text{SOS}}}{\text{max}} c\displaystyle\>\>c (15)
s.t. V(x)xx˙Lc(x)(cV(x))\displaystyle-\frac{\partial V(x)}{\partial x}\dot{x}-L_{c}(x)(c-V(x)) 𝒫SOS,\displaystyle\in\mathcal{P}^{\text{SOS}},

where Lc(x)L_{c}(x) is an auxiliary factor to relax the non-negativity constraint for the initial barrier function h(x)h(x).

2. Search another two auxiliary factors L1,2(x)L_{1,2}(x) for h(x)h(x),

L1(x),L2(x)𝒫SOS\displaystyle\exists\;L_{1}(x),L_{2}(x)\in\mathcal{P}^{\text{SOS}} (16)
s.t. V(x)xx˙L1(x)h(x)\displaystyle-\frac{\partial V(x)}{\partial x}\dot{x}-L_{1}(x)h(x) 𝒫SOS,\displaystyle\in\mathcal{P}^{\text{SOS}},
h(x)xx˙L2(x)h(x)\displaystyle\frac{\partial h(x)}{\partial x}\dot{x}-L_{2}(x)h(x) 𝒫SOS.\displaystyle\in\mathcal{P}^{\text{SOS}}.

Meanwhile, h(x)h(x) can be re-written into the square matrix representation form as h(x)=Z(x)TQZ(x)h(x)=Z(x)^{\mathrm{T}}QZ(x), where QQ is a semi-definite coefficient matrix and Z(x)Z(x) is a monomial vector [13]. The trace Tr()\textit{Tr}(\cdot) of QQ is used to approximate the volume of barrier certified ROA (BCROA).

3. Enlarge Tr(Q)\textit{Tr}(Q) to parameterize a permissive h(x)h(x) with fixed L1(x)L_{1}(x) and L2(x)L_{2}(x),

maxh(x)𝒫Tr(Q)\displaystyle\underset{\begin{subarray}{c}h(x)\in\mathcal{P}\end{subarray}}{\text{max}}\quad\textit{Tr}(Q) (17)
s.t. V(x)xx˙L1(x)h(x)\displaystyle-\frac{\partial V(x)}{\partial x}\dot{x}-L_{1}(x)h(x) 𝒫SOS,\displaystyle\in\mathcal{P}^{\text{SOS}},
h(x)xx˙L2(x)h(x)\displaystyle\frac{\partial h(x)}{\partial x}\dot{x}-L_{2}(x)h(x) 𝒫SOS.\displaystyle\in\mathcal{P}^{\text{SOS}}.

The optimal barrier function h(x)h^{*}(x) can be found with a permissive BCROA if the increase of Tr(Q)\textit{Tr}(Q) is less than a threshold, otherwise repeat 2 and 3 for a long term.

These 3 SOSPs demonstrate the process to obtain an optimal BCROA directly. Thus, if a ROA exists in the learned polynomial system (9), a permissive BCROA would be computed by these SOSPs directly.

4.2 Existence of Probabilistic BCROA

The relationship of a BCROA in the learned system (12) toward the true dynamics is given in the following theorem.

Theorem 1.

Given kk measurements of a partially unknown system (9) such that we can obtain a learned system (12) with a probability greater or equal to (1δ)k,δ(0,1)(1-\delta)^{k},\delta\in(0,1). Based on the SOSPs (15), (16) and (17), if there exists a BCROA ={x𝒳|h(x)0}\mathcal{L}=\{x\in\mathcal{X}|h(x)\geq 0\} such that

h(x)x(f(x)+Pk(x)+mdξ(x))0\displaystyle\frac{\partial h(x)}{x}(f(x)+P_{k}(x)+m_{d_{\xi}}(x))\geq 0 (18)

holds for the states xx\in\mathcal{L}, then \mathcal{L} can be regarded as a BCROA in (9) with probability bounds ((1δ)k,1)((1-\delta)^{k},1).

Proof.

Given an arbitrary initial state x0x_{0}\in\mathcal{L} in the learned system (12), its state trajectory ψ(t;x0),t[0,]\psi(t;x_{0}),t\in[0,\infty] is guaranteed to converge to the origin due to the Lyapunov constraints in (15), (16) and (17). Since the derivative of the barrier function h(ψ(t;x0))t\frac{\partial h(\psi(t;x_{0}))}{\partial t} is non-negative over the trajectory ψ(t;x0)\psi(t;x_{0}), h(ψ(t;x0))h(\psi(t;x_{0})) will strictly increase along ψ(t;x0),t[0,)\psi(t;x_{0}),t\in[0,\infty). Then, the BCROA \mathcal{L} established a region with the safety and stability guarantee in (12).

Because \mathcal{L} denotes a certain region in (12), while (12) is an approximation result of (9) within probability bounds ((1δ)k,1)((1-\delta)^{k},1). Thus, \mathcal{L} also denotes a potential BCROA toward the true dynamics in (9) with the same probability bounds, which ends the proof. ∎

The BCROA in Theorem 1 that satisfies (15), (16) and (17) encodes the safety of the learned polynomial system (12). It further shows a probabilistic relationship toward the real ROA in the concerned system (9) such that we can make high-probability statements about system safety.

4.3 Probabilistic Optimal BCROA Estimate

To illustrate our procedures of finding probabilistic optimal BCROAs, we present an algorithm below and a corresponding flowchart in Figure 5, which both consist of a learned polynomial system and SOSPs. Besides, the following theorem proposes the confidence range of the result from Algorithm 1 over episodes.

Input: Original system (1); initial sublevel set of a given Lyapunov function 0={x|V(x)c0}\mathcal{L}_{0}=\{x|V(x)\leq c_{0}\}; SOSP termination ϵ\epsilon and episodes KK.
Output: Optimal BCROAs R=[1,2,,K]TR=[\mathcal{L}^{*}_{1},\mathcal{L}^{*}_{2},\dots,\mathcal{L}^{*}_{K}]^{\mathrm{T}}; probability bounds P=[δ1,δ2,,δK]TP=[\delta_{1},\delta_{2},\dots,\delta_{K}]^{\mathrm{T}}.
1 Preprocess (1) into (9) by Chebyshev interpolants. Initialize the prior information set Dξ=D_{\xi}=\emptyset.
2 Store the measurements of dξ(x)d_{\xi}(x) in (9) as D0D_{0}.
3 for i{1,2,,K}i\in\{1,2,\dots,K\} do
4   Update prior information Dξ=Di1D_{\xi}=\cup D_{i-1}.
5   Construct (12) with polynomial m(x)m(x).
6   Execute SOSP (15) and set the initial barrier function as hi(x)=cV(x)h_{i}(x)=c^{*}-V(x).
7   Execute SOSPs (16) and (17) to obtain the optimal BCROA i={x|hi(x)0}\mathcal{L}^{*}_{i}=\{x|h^{*}_{i}(x)\geq 0\}.
8   Record the probability bounds δi\delta_{i} and update the measurement of dξ(x)d_{\xi}(x) according to (13).
9return R,PR,P.
Algorithm 1 Optimal BCROAs Estimate
\pgfmathresultptConvert the original system (1) into theequalative system (9).\pgfmathresultptCollect the measurement of dξd_{\xi} to enrichthe prior information set DξD_{\xi}.\pgfmathresultptConstruct the polynomial system (12), and execute the SOSPs (15), (16), (17).Δ(Tr(Q))ϵ\Delta(\textit{Tr}(Q))\leq\epsilon\pgfmathresultptCollecti\mathcal{L}^{*}_{i}.\pgfmathresultptCompute the probability factor δi\delta_{i}.Select the most uncertain point as (13).Episode i=Ti=T\pgfmathresultptReturnR,PR,P.123NY4Y5N6
Figure 5: Flowchart of Algorithm 1.
Theorem 2.

Algorithm 1 establishes KK probabilistic ROA [1,2,,K]T[\mathcal{L}^{*}_{1},\mathcal{L}^{*}_{2},\dots,\mathcal{L}^{*}_{K}]^{\mathrm{T}} with probabilities greater or equal to [(1δ1)n1,(1δ2)n2,,(1δK)nK]T[(1-\delta_{1})^{n_{1}},(1-\delta_{2})^{n_{2}},\dots,(1-\delta_{K})^{n_{K}}]^{\mathrm{T}}, where nin_{i} denotes the sample number with index i=1,2,Ki=1,2\cdots,K. Then, the following probability bounds are given:

{i[1,K],i,D}\displaystyle\mathbb{P}\left\{\forall i\in[1,K],\forall\mathcal{L}^{*}_{i}\neq\emptyset,\mathcal{L}^{*}_{\cup}\subset D\right\} =min{(1δi)ni}\displaystyle=\min\{(1-\delta_{i})^{n_{i}}\} (19)
{i[1,K],i,D}\displaystyle\mathbb{P}\left\{\forall i\in[1,K],\forall\mathcal{L}^{*}_{i}\neq\emptyset,\mathcal{L}^{*}_{\cap}\subset D\right\} =max{(1δi)ni},\displaystyle=\max\{(1-\delta_{i})^{n_{i}}\},

where =i=1Ki\mathcal{L}^{*}_{\cup}=\cup_{i=1}^{K}\mathcal{L}^{*}_{i} and =i=1Ki\mathcal{L}^{*}_{\cap}=\cap_{i=1}^{K}\mathcal{L}^{*}_{i} denote the union set and the intersection set of these optimal BCROAs, respectively, and DD denotes the exact ROA of the system (9).

Proof.

After executing the algorithm KK times, we could get optimal BCROAs [1,2,,K]T[\mathcal{L}^{*}_{1},\mathcal{L}^{*}_{2},\dots,\mathcal{L}^{*}_{K}]^{\mathrm{T}} with probability bounds [[(1δ1)n1,1),[(1δ2)n2,1),,[(1δK)nK,1)]T[[(1-\delta_{1})^{n_{1}},1),[(1-\delta_{2})^{n_{2}},1),\dots,[(1-\delta_{K})^{n_{K}},1)]^{\mathrm{T}} toward the dynamical system (9). Since the increasing KK generates larger information set DξD_{\xi}, when KK\rightarrow\infty, the GP model can be inferred with the richest DξD_{\xi} such that the lower bound of this probabilistic range is approximating 11. Let \mathcal{L}^{*}_{\cap} be the intersection set of i\mathcal{L}^{*}_{i}, which represents the most common part of these optimal BCROAs toward the exact dynamics of (9). Then, a conservative statement about \mathcal{L}^{*}_{\cap} toward the real ROA DD in (9) establishes with a minimum value of {(1δi)nK}\{(1-\delta_{i})^{n_{K}}\}. Therefore, the second equation in (19) about \mathcal{L}^{*}_{\cup} toward DD establishes in a similar way, which completes the proof. ∎

Theorem 2 presents a probabilistic statement of i\mathcal{L}^{*}_{i} generated from Algorithm 1. Obviously, a higher degree mean function mdξ(x)m_{d_{\xi}}(x) in (12) could approximate the real dynamics much more precisely. Note that the shape of i\mathcal{L}^{*}_{i} is various with uncertainty. It is worthy noting that the corresponding Lyapunov function V(x)V(x) and the barrier function h(x)h(x) could increase their degrees simultaneously for a better estimate. This is because a higher degree polynomial certified ROA can explore more areas, so the probabilistic bound narrows naturally.

5 Numerical Examples

Based on the Matlab toolbox of Chebfun, GPML, SOSOPT and Mosek solver, we estimate the BCROA of the autonomous partially unknown system by two examples.

5.1 Example 4: A 2D Nonlinear System

Extended to Example 3, we consider the system:

[x˙1x˙2]=[x1+x2x12x2+1|exp(x1)cos(x1)|]+[0d(x)].\displaystyle\begin{aligned} \begin{bmatrix}\dot{x}_{1}\\ \dot{x}_{2}\end{bmatrix}=\begin{bmatrix}-x_{1}+x_{2}\\ x_{1}^{2}x_{2}+1-\sqrt{|\exp{(x_{1})}\cos{(x_{1})}|}\end{bmatrix}+\begin{bmatrix}0\\ d(x)\end{bmatrix}.\end{aligned} (20)

Example 3 shows the construction of the first learned dynamical system in Figure 3(b) and the noise over the system measurement is bounded by 0.010.01. The learned GP models are consisted by a 2nd2^{nd} degree polynomial mean function zero, and a RBF kernel function k(x,x)k(x,x^{\prime}) with signal variances exp(0.1)\exp(0.1) and length scale exp(0.2)\exp(0.2).

At each episode, a given LCROA {x|V(x)c0=0.1}\{x|\;V(x)\leq c_{0}=0.1\} is considered to the next step, where V(x)=2x14+0.5x24x12x22+x12+x220.5x1x2V(x)=2x_{1}^{4}+0.5x_{2}^{4}-x_{1}^{2}x_{2}^{2}+x_{1}^{2}+x_{2}^{2}-0.5x_{1}x_{2} is a 4th4^{th} degree polynomial. Besides, we obtained the probability bounds with a fixed discounting factor β=2\sqrt{\beta}=2 according to the method in [21]. After 1010 episodes, the probability bounds of these optimal BCROAs is {i[1,10],iD}[0.9500,0.9700]\mathbb{P}\{\forall\mathcal{L}_{i\in[1,10]}^{*},\mathcal{L}^{*}_{i}\subset D\}\in[0.9500,0.9700], where DD denotes the exact ROA in (20).

Refer to caption
Figure 6: ROA estimate of (20). The red region enclosed by the yellow dashed line depicts the LCROA and the blue region enclosed by the black solid line depicts the optimal BCROA. The gray lines and the red arrows depict the corresponding vector field of (20) without unknown term d(x)d(x).

In Figure 6, each computed optimal BCROA is displayed with a fixed transparency. The intersection region of these optimal BCROAs is established with the probability [0.9700,1.0000][0.9700,1.0000], while the union region of these optimal BCROAs is established with the probability [0.9500,1.0000][0.9500,1.0000]. Therefore, the result of these probabilistic optimal BCROAs in Figure 6 is not only obviously larger than the LCROA, but also guaranteeing the system safety by a clear probability range.

5.2 Example 5: A 3D Nonlinear System

Consider a 3D nonlinear system corrupted with unknown terms d1(x),d3(x)d_{1}(x),d_{3}(x) as the form of (1),

[x˙1x˙2x˙3]=[x12cos(x12)sin(x1)x2x13x2x12x3+1|exp(x1)cos(x1)|]+[d1(x)0d3(x)].\displaystyle\begin{aligned} \begin{bmatrix}\dot{x}_{1}\\ \dot{x}_{2}\\ \dot{x}_{3}\end{bmatrix}=\begin{bmatrix}-x_{1}^{2}-\cos{(x_{1}^{2})}\sin{(x_{1})}\\ -x_{2}-x_{1}^{3}x_{2}\\ -x_{1}^{2}x_{3}+1-\sqrt{|\exp{(x_{1})}\cos{(x_{1})}|}\end{bmatrix}+\begin{bmatrix}d_{1}(x)\\ 0\\ d_{3}(x)\end{bmatrix}.\end{aligned} (21)

The non-polynomial terms in (21) are approximated by the 4th4^{th} degree Chebyshev interpolants in [2,2]3[-2,2]^{3}. To learn the system dynamics, we first collect the trajectory starting at (0.1,0.1,0.1)(-0.1,-0.1,0.1), where the noises over d1(x)d_{1}(x) and d3(x)d_{3}(x) are bounded by 0.010.01. In this case, we use a 3rd3^{rd} degree polynomial mean function and a RBF kernel with signal variances exp(0.01)\exp(0.01) and length scale exp(0.2)\exp(0.2).

A sublevel set {x|V(x)c0=5×103}\{x|\;V(x)\leq c_{0}=5\times 10^{-3}\} is given to construct the LCROA with a 4th4^{th} order V(x)=10x14+x24+20x34+2x12x224x32x22+3x12x32V(x)=10x_{1}^{4}+x_{2}^{4}+20x_{3}^{4}+2x_{1}^{2}x_{2}^{2}-4x_{3}^{2}x_{2}^{2}+3x_{1}^{2}x_{3}^{2}. Set β=2\sqrt{\beta}=2 such that the probability bounds of 1010 optimal BCROAs can be computed as {i=110,iD}[0.8395,0.9567]\mathbb{P}\{\forall\mathcal{L}_{i=1}^{10},\;\mathcal{L}_{i}\subset D\}\in[0.8395,0.9567].

These optimal BCROAs are displayed in Figure 7 with a fixed transparency. From Figure 7, we see that there are various differences of these optimal BCROAs. Furthermore, as seen in Table 1, with the same hyperparameters, the computation time of Example 5 is greater than Example 4 significantly. However, at the episode 10, Example 5 will first stop learning due to the high computation cost. Although higher dimensional functions find a comparable BCROA, but this is often not the case due to the fragile and changeable dynamics under uncertainty.

Refer to caption
Figure 7: ROA estimates of (21). The red cube enclosed by the solid lines depicts the LCROA and the blue cube enclosed by the dashed line denotes optimal BCROAs. The vector field is projected onto the plane.
Table 1: Computation Time tct_{c} [sec] of Algorithm 1.
\hhline Episode 1 Episode 5 Episode 10
Example 4 35.635.6 168.3168.3 1752.61752.6
Example 5 406.4406.4 837.8837.8 1329.41329.4
\hhline

6 Conclusion

In this work, a method is proposed to compute a barrier certified region of attraction such that the stability and safety can both be guaranteed for a class of partially unknown systems. The proposed method is built based on Chebyshev interpolants, Gaussian processes, sum-of-squares programming and a safe sample policy. The effectiveness of the proposed algorithm has been demonstrated via two numerical examples.

On the other hand, we would like to admit that a large amount of unknown terms may affect the accuracy and efficiency of proposed algorithm. Possible solutions to address this issue could be to construct an efficient sample policy [29] or to exploit local GP regression [30]. Thus, future efforts will be devoted to prior data selection in further enlarging the estimated region of attraction for partially unknown systems [13, 17].

References

  • [1] E. Glassman, A. L. Desbiens, M. Tobenkin, M. Cutkosky, and R. Tedrake, “Region of attraction estimation for a perching aircraft: A Lyapunov method exploiting barrier certificates,” in Proceedings of the International Conference on Robotics and Automation, pp. 2235–2242, 2012.
  • [2] L. Wang, E. A. Theodorou, and M. Egerstedt, “Safe learning of quadrotor dynamics using barrier certificates,” in Proceedings of the International Conference on Robotics and Automation, pp. 2460–2465, 2018.
  • [3] S.-C. Hsu, X. Xu, and A. D. Ames, “Control barrier function based quadratic programs with application to bipedal robotic walking,” in Proceedings of the American Control Conference, pp. 4542–4548, 2015.
  • [4] P. Magne, D. Marx, B. Nahid-Mobarakeh, and S. Pierfederici, “Large-Signal Stabilization of a DC-Link Supplying a Constant Power Load Using a Virtual Capacitor: Impact on the Domain of Attraction,” IEEE Transactions on Industry Applications, vol. 48, no. 3, pp. 878–887, 2012.
  • [5] G. Chesi, Domain of attraction: Analysis and control via SOS programming, vol. 415. Springer Science & Business Media, 2011.
  • [6] V. I. Zubov, Methods of AM Lyapunov and their application. P. Noordhoff, 1964.
  • [7] F. Blanchini, “Set invariance in control,” Automatica, vol. 35, no. 11, pp. 1747–1767, 1999.
  • [8] A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, and P. Tabuada, “Control barrier functions: Theory and applications,” in Proceedings of the European Control Conference, pp. 3420–3431, 2019.
  • [9] A. D. Ames, X. Xu, J. W. Grizzle, and P. Tabuada, “Control barrier function based quadratic programs for safety critical systems,” IEEE Transactions on Automatic Control, vol. 62, no. 8, pp. 3861–3876, 2016.
  • [10] P. A. Parrilo, Structured semidefinite programs and semialgebraic geometry methods in robustness and optimization. PhD thesis, California Institute of Technology, 2000.
  • [11] I. R. Manchester, M. M. Tobenkin, M. Levashov, and R. Tedrake, “Regions of attraction for hybrid limit cycles of walking robots,” IFAC Proceedings Volumes, vol. 44, no. 1, pp. 5801–5806, 2011.
  • [12] A. El-Guindy, D. Han, and M. Althoff, “Estimating the region of attraction via forward reachable sets,” in Proceedings of the American Control Conference, pp. 1263–1270, 2017.
  • [13] L. Wang, D. Han, and M. Egerstedt, “Permissive barrier certificates for safe stabilization using sum-of-squares,” in Proceedings of the American Control Conference, pp. 585–590, 2018.
  • [14] D. Han, and M. Althoff, “On estimating the robust domain of attraction for uncertain non-polynomial systems: An LMI approach,” in Proceedings of the Conference on Decision and Control, pp. 2176–2183, 2016.
  • [15] A. Iannelli, A. Marcos, and P. Seiler, “An equilibrium-independent region of attraction formulation for systems with uncertainty-dependent equilibria,” in Proceedings of the Conference on Decision and Control, pp. 725–730, 2018.
  • [16] J. Vinogradska, B. Bischoff, D. Nguyen-Tuong, and J. Peters, “Stability of controllers for Gaussian process dynamics,” The Journal of Machine Learning Research, vol. 18, no. 1, pp. 3483–3519, 2017.
  • [17] J. Umlauft, L. Pöhler, and S. Hirche, “An uncertainty-based control Lyapunov approach for control-affine systems modeled by Gaussian process,” IEEE Control Systems Letters, vol. 2, no. 3, pp. 483–488, 2018.
  • [18] F. Berkenkamp, R. Moriconi, A. P. Schoellig, and A. Krause, “Safe learning of regions of attraction for uncertain, nonlinear systems with Gaussian processes,” in Proceedings of the Conference on Decision and Control, pp. 4661–4666, 2016.
  • [19] J. Umlauft, A. Lederer, and S. Hirche, “Learning stable Gaussian process state space models,” in Proceedings of the American Control Conference, pp. 1499–1504, 2017.
  • [20] A. Devonport, H. Yin, and M. Arcak, “Bayesian safe learning and control with sum-of-squares analysis and polynomial kernels,” in Proceedings of the Conference on Decision and Control, pp. 3159–3165, 2020.
  • [21] P. Jagtap, G. J. Pappas, and M. Zamani, “Control barrier functions for unknown nonlinear systems using Gaussian processes,” in Proceedings of the Conference on Decision and Control, pp. 3699–3704, 2020.
  • [22] D. Han, and M. Althoff, “Estimating the domain of attraction based on the invariance principle,” in Proceedings of the Conference on Decision and Control, pp. 5569–5576, 2016.
  • [23] H. K. Khalil, Nonlinear systems; 3rd ed. Prentice-Hall, 2002.
  • [24] C. K. Williams and C. E. Rasmussen, Gaussian processes for machine learning, vol. 2. MIT press, 2006.
  • [25] V. I. Paulsen and M. Raghupathi, An introduction to the theory of reproducing kernel Hilbert spaces, vol. 152. Cambridge university press, 2016.
  • [26] L. N. Trefethen, Approximation Theory and Approximation Practice. SIAM, 2019.
  • [27] N. Srinivas, A. Krause, S. M. Kakade, and M. Seeger, “Gaussian process optimization in the bandit setting: No regret and experimental design,” arXiv preprint: 0912.3995, 2009.
  • [28] M. Putinar, “Positive polynomials on compact semi-algebraic sets,” Indiana University Mathematics Journal, vol. 42, no. 3, pp. 969–984, 1993.
  • [29] J. Umlauft, T. Beckers, A. Capone, A. Lederer, and S. Hirche, “Smart forgetting for safe online learning with Gaussian Processes,” in Learning for Dynamics and Control, pp. 160–169, PMLR, 2020.
  • [30] D. Nguyen-Tuong, M. Seeger, and J. Peters, “Model learning with local Gaussian Process regression,” Advanced Robotics, vol. 23, no. 15, pp. 2015–2034, 2009.
  • [31] C. Santoyo, M. Dutreix, and S. Coogan, “A barrier function approach to finite-time stochastic system verification and control,” Automatica, vol. 125, p. 109439, 2021.
  • [32] H. König, Eigenvalue distribution of compact operators, vol. 16. Birkhäuser, 2013.

Appendix: Proof of Proposition 1

Proof.

First, Mercer’s Theorem [32] allows us to represent a kernel by a kernel basis vector ϕ()\phi(\cdot). For this paper, we restrict our attention to the RBF kernel k(x,x)k(x,x^{\prime}), though our methodology can support more general kernels,

k(x,x)\displaystyle k(x,x^{\prime}) =σ2exp((xx)22l2)\displaystyle=\sigma^{2}\exp(\frac{(x-x^{\prime})^{2}}{-2l^{2}}) (22)
=i=0σ22i2l2i!exp((x)22l2)(x2l2)iexp((x)22l2)(x2l2)i\displaystyle=\sum\limits_{i=0}^{\infty}\frac{\sigma^{2}2^{i}}{2l^{2}i!}\exp(\frac{(x)^{2}}{-2l^{2}})(\frac{x}{2l^{2}})^{i}\exp(\frac{(x^{\prime})^{2}}{-2l^{2}})(\frac{x^{\prime}}{2l^{2}})^{i}
=ϕ(x)Tϕ(x),\displaystyle=\phi(x)^{\mathrm{T}}\phi(x^{\prime}),

where σ\sigma denotes the signal variance, ll denotes the length scale and ϕ(x)\phi(x) denotes an infinite dimensional vector,

ϕ(x)=exp((x)22l2)(σ2202l20!(x2l2)0,σ2212l21!(x2l2)1,).\displaystyle\phi(x)=\exp(\frac{(x)^{2}}{-2l^{2}})\cdot(\sqrt{\frac{\sigma^{2}2^{0}}{2l^{2}0!}}(\frac{x}{2l^{2}})^{0},\sqrt{\frac{\sigma^{2}2^{1}}{2l^{2}1!}}(\frac{x}{2l^{2}})^{1},\dots). (23)

Then, since d(x)d(x) and its measurement y(x)y(x) in (1) are bounded in the RKHS, we can formulate,

d(x)=ϕ(x)Tw,y(x)\displaystyle d(x)=\phi(x)^{\mathrm{T}}w,\quad y(x) =d(x)+ϵ,\displaystyle=d(x)+\epsilon, (24)

where ww denotes the weight vector and ϵ\epsilon denotes the noise. Let p(w)𝒩(0,σp2)p(w)\sim\mathcal{N}(0,\sigma_{p}^{2}) be the prior distribution of ww and let p(w|y,x)p(w|y,x) be the posterior distribution of ww based on yy and xx, we can infer their relationship as follows,

posterior=likelihood×priormarginal likelihood,p(w|y,x)=p(y|x,w)p(w)p(y|x).\small\small\text{posterior}=\frac{\text{likelihood}\times\text{prior}}{\text{marginal\;likelihood}},p{(w|y,x)}=\frac{p(y|x,w)\cdot p(w)}{p(y|x)}. (25)

Next, let K1K_{1} be the weight-independent marginal likelihood p(y|x)p(y|x). GP is relocating likelihood p(y|x,w)p(y|x,w) with measurements [(x1,y1),(x2,y2),,(xk,yk)]T[(x_{1},y_{1}),(x_{2},y_{2}),\dots,(x_{k},y_{k})]^{\mathrm{T}} as below,

p(y|x,w)\displaystyle p(y|x,w) =i=1kp(yi,xi,w)=i=1kexp((yiϕ(xi)Tw)22σn2)σn2π,\displaystyle=\prod_{i=1}^{k}p\left(y_{i},x_{i},w\right)=\prod^{k}_{i=1}\frac{\exp{(\frac{(y_{i}-\phi(x_{i})^{\mathrm{T}}w)^{2}}{-2\sigma_{n}^{2}})}}{\sigma_{n}\sqrt{2\pi}}, (26)

to compute the p(w|y,x)p{(w|y,x)} based on (25) and (26),

p(w|x,y)\displaystyle p(w|x,y) =i=1kp(yi,xi,w)12πσp2exp(wTσp2w2)K1\displaystyle=\frac{\prod\limits_{i=1}^{k}p\left(y_{i},x_{i},w\right)\cdot\frac{1}{\sqrt{2\pi\sigma^{2}_{p}}}\exp{(\frac{w^{\mathrm{T}}\sigma_{p}^{-2}w}{-2})}}{K_{1}} (27)
=K2exp(wTσp2w2)exp(i=1k(yiϕ(xi)Tw)22σn2),\displaystyle=\!K_{2}\cdot\exp{(\frac{w^{\mathrm{T}}\sigma_{p}^{-2}w}{-2})}\cdot\exp{(\frac{\sum\limits_{i=1}^{k}(y_{i}-\phi(x_{i})^{\mathrm{T}}w)^{2}}{-2\sigma_{n}^{2}})},

where K2=1K12πσp2(σn2π)kK_{2}=\frac{1}{K_{1}\cdot\sqrt{2\pi\sigma_{p}^{2}}\cdot(\sigma_{n}\sqrt{2\pi})^{k}}.

By using the Maximum a Posterior estimation, the exact value of the optimal weight w^\hat{w} can be computed as,

w^\displaystyle\hat{w} =argmaxwp(w|x,y)=argmaxwlogp(w|x,y)\displaystyle=\arg\max_{w}\;p(w|x,y)=\arg\max_{w}\log{p(w|x,y)} (28)
=argminwlogp(w|x,y)\displaystyle=\arg\min_{w}\;-\log{p(w|x,y)}
=argminwlogK2+wTσp2w2+i=1k(yiϕ(xi)Tw)22σn2\displaystyle=\arg\min_{w}\;-\log{K_{2}}+\frac{w^{\mathrm{T}}\sigma_{p}^{-2}w}{2}+\frac{\sum\limits_{i=1}^{k}(y_{i}-\phi(x_{i})^{\mathrm{T}}w)^{2}}{2\sigma_{n}^{2}}
=argminwwT(σn2σp2)w+i=1k(yiϕ(xi)Tw)2\displaystyle=\arg\min_{w}\;w^{\mathrm{T}}(\sigma_{n}^{2}\sigma_{p}^{-2})w+\sum\limits_{i=1}^{k}(y_{i}-\phi(x_{i})^{\mathrm{T}}w)^{2}
=argminwwT(ϕ(x)Tϕ(x)+σn2σp2)w2wTϕ(x)Ty+yTy\displaystyle=\arg\min_{w}\;w^{\mathrm{T}}(\phi(x)^{\mathrm{T}}\phi(x)+\sigma_{n}^{2}\sigma_{p}^{-2})w-2w^{\mathrm{T}}\phi(x)^{\mathrm{T}}y+y^{\mathrm{T}}y
=argminwL(w).\displaystyle=\arg\min_{w}\;L(w).

The derivative of L(w)L(w) in (28) can be computed as,

L(w)w\displaystyle\frac{\partial L(w)}{\partial w} =2(ϕ(x)Tϕ(x)+σn2σp2)w2ϕ(x)Ty,\displaystyle=2(\phi(x)^{\mathrm{T}}\phi(x)+\sigma_{n}^{2}\sigma_{p}^{-2})w-2\phi(x)^{\mathrm{T}}y, (29)

and the optimal weight w^\hat{w} can be parameterized directly,

w^\displaystyle\hat{w} =(ϕ(x)Tϕ(x)+σn2σp2)1ϕ(x)Ty\displaystyle=(\phi(x)^{\mathrm{T}}\phi(x)+\sigma_{n}^{2}\sigma_{p}^{-2})^{-1}\phi(x)^{\mathrm{T}}y (30)
=ϕ(x)T(ϕ(x)ϕ(x)T+σn2σp2)1y\displaystyle=\phi(x)^{\mathrm{T}}(\phi(x)\phi(x)^{\mathrm{T}}+\sigma_{n}^{2}\sigma_{p}^{-2})^{-1}y
=σp2ϕ(x)T(ϕ(x)σp2ϕ(x)T+σn2)1y,\displaystyle=\sigma_{p}^{2}\phi(x)^{\mathrm{T}}(\phi(x)\sigma_{p}^{2}\phi(x)^{\mathrm{T}}+\sigma_{n}^{2})^{-1}y,

Thus, we can obtain the optimal p(w^|x,y)p(\hat{w}|x,y) as (27) shows,

p(w^|x,y)(\displaystyle p(\hat{w}|x,y)\sim( σp2ϕ(x)T(ϕ(x)σp2ϕ(x)T+σn2)1y,\displaystyle\sigma_{p}^{2}\phi(x)^{\mathrm{T}}(\phi(x)\sigma_{p}^{2}\phi(x)^{\mathrm{T}}+\sigma_{n}^{2})^{-1}y, (31)
(ϕ(x)σp2ϕ(x)T+σn2)1),\displaystyle(\phi(x)\sigma_{p}^{2}\phi(x)^{\mathrm{T}}+\sigma_{n}^{2})^{-1}),

where the predictive output d(x)d(x_{*}) of the query states xx_{*} is,

d(x)\displaystyle d(x_{*}) =ϕ(x)Tw^.\displaystyle=\phi(x_{*})^{\mathrm{T}}\hat{w}. (32)

Because ϕ()\phi(\cdot) is an infinite vector, which is used to approximate yy with optimal weight w^\hat{w}. To avoid the loss of generality, we can truncate ϕ(x)\phi(x) by a finite monomial vector φ(x)\varphi(x) without any impact of the inference processes from (26) to (31). This will reshape the mean function in the polynomial form and adjust the second term in (31) as,

m(x)\displaystyle m(x_{*}) =φ(x)Tw^,\displaystyle=\varphi(x_{*})^{\mathrm{T}}\hat{w}, (33)
σ2(x)\displaystyle\sigma^{2}(x_{*}) =k(x,x)kT(K+σn2I)1k.\displaystyle=k(x_{*},x_{*})-k_{*}^{\mathrm{T}}(K+\sigma_{n}^{2}I)^{-1}k_{*}.

where [K](i,j)=k(xi,xj)[K]_{(i,j)}=k(x_{i},x_{j}) is a kernel Gramian matrix and k=[k(x1,x),k(x2,x),,k(xw,x)]Tk_{*}=[k(x_{1},x_{*}),k(x_{2},x_{*}),\dots,k(x_{w},x_{*})]^{\mathrm{T}}. Thus, Proposition 1 is obtained, which completes the proof. ∎