This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Perception-Based Sampled-Data Optimization of Dynamical Systems

Liliaokeawawa Cothren    Gianluca Bianchin    Sarah Dean    Emiliano Dall’Anese Department of ECEE, University of Colorado Boulder ICTEAM Institute, Université Catholique de Louvain Department of Computer Science, Cornell University
Abstract

Motivated by perception-based and sensing-based control problems in autonomous systems, this paper addresses the problem of developing feedback controllers to regulate the inputs and the states of a dynamical system to optimal solutions of an optimization problem when one has no access to exact measurements of the system states. In particular, we consider the case where the states need to be estimated from high-dimensional sensory data received only at some time instants. We develop a sampled-data feedback controller that is based on adaptations of a projected gradient descent method and includes neural networks as integral components to estimate the state of the system from perceptual information. We derive sufficient conditions to guarantee (local) input-to-state stability of the control loop. Moreover, we show that the interconnected system tracks the solution trajectory of the underlying optimization problem up to an error that depends on the approximation errors of the neural network and on the time-variability of the optimization problem; the latter originates from time-varying safety and performance objectives, input constraints, and unknown disturbances.

keywords:
Learning for control, data-driven control, feedback optimization, output regulation.
thanks: This work was supported in part by the National Science Foundation through the award CMMI 2044946. Corresponding author: Liliaokeawawa Cothren: Liliaokeawawa.Cothren@colorado.edu

1 Introduction

A major challenge in controlling complex autonomous systems consists of incorporating rich data from perceptual and sensing sensor data. The performance of feedback control systems critically relies on the information extracted from perceptual sensing, which may require the processing of high-dimensional data only available at given spatio-temporal granularities; see, e.g. Dean and Recht (2021); Al Makdah et al. (2020); Xu et al. (2021); Dawson et al. (2022). This paper investigates how to integrate perceptual information into controllers inspired by optimization algorithms where the goal is to steer a dynamic system toward solutions of an optimization problem with costs associated with the system’s inputs and states. For example, in autonomous driving, the optimization problem may formalize way-point tracking and obstacle avoidance, whose information is provided by images from a dashboard camera. Other examples include robotics and power systems (the latter leveraging pseudo-measurements).

The line of research on feedback optimization goes back to earlier concepts of KKT-type controllers in Jokic et al. (2009); Brunner et al. (2012); Hirata et al. (2014), and it was recently expanded to include new classes of controllers inspired by first-order optimization methods in Colombino et al. (2020); Zheng et al. (2020); Hauswirth et al. (2021); Lawrence et al. (2020); Bianchin et al. (2021, 2022); Belgioioso et al. (2021); Agarwal et al. (2022); Simpson-Porco (2021); also see references therein. In this paper, we provide contributions relative to existing works by considering a setup where the cost of the optimization problem evolves over time (for instance, for way-point tracking and to avoid moving obstacles) and the state of the system cannot be directly measured. The latter is a distinctive feature of this work: we address the case in which optimization-based controllers must leverage perceptual information available at given temporal granularities and learning mechanisms to estimate the system state.

We develop a sampled-data feedback controller that is based on an adaptation of a projected gradient descent method. Based on the specified time-varying costs, the gradient-based controller generates inputs for the system which are then passed through a zero-order hold. Importantly, the controller leverages a trained neural network that maps perceptual information into estimates of the state of the system. We derive sufficient conditions to guarantee (local) input-to-state stability (ISS) of the control loop. In particular, we show that the interconnected system tracks the optimal solution trajectory of the optimization problem up to an error that depends on the approximation errors of the neural network and the time-variability of the cost and unknown disturbance. The ISS bounds are derived by leveraging the fundamental results of Jiang and Wang (2001) and Nešić et al. (1999).

We note that a similar perception-based regulation problem was considered in our prior work in Cothren et al. (2022a); however, in Cothren et al. (2022a) controllers operate at continuous time and thus do not account for the sample-data nature of the feedback information. A sampled-data controller was developed in Belgioioso et al. (2021); with respect to Belgioioso et al. (2021), we consider cases where the optimization problem is time-varying and the state is estimated via perception maps.

We test our controller on an autonomous driving application where vehicles are modeled via unicycle dynamics; the controller acquires the position of the vehicle from a neural network that estimates positions from images.

2 Problem Formulation

In the following, we formalize our research problem and discuss the necessary assumptions.111Notation. We denote by ,+,,>0, and 0\mathbb{N},\mathbb{Z}_{+},\mathbb{R},\mathbb{R}_{>0},\text{ and }\mathbb{R}_{\geq 0} the set of natural numbers, positive integers, real numbers, positive real numbers, and non-negative real numbers. For vectors xnx\in\mathbb{R}^{n} and umu\in\mathbb{R}^{m}, x\|x\| is the Euclidean norm of xx, x\|x\|_{\infty} is the supremum norm, and (x,u)n+m(x,u)\in\mathbb{R}^{n+m} is their vector concatenation. xx^{\top} denotes transposition, and xix_{i} denotes the ii-th element of xx. For a matrix An×mA\in\mathbb{R}^{n\times m}, A\|A\| is the induced 22-norm and A\|A\|_{\infty} is the supremum norm. The set n(r):={zn:z<r}\mathcal{B}_{n}(r):=\{z\in\mathbb{R}^{n}:\|z\|<r\} is the open ball in n\mathbb{R}^{n} with radius r>0r>0; n[r]:={zn:zr}\mathcal{B}_{n}[r]:=\{z\in\mathbb{R}^{n}:\|z\|\leq r\} is the closed ball. Given two sets 𝒳n\mathcal{X}\subset\mathbb{R}^{n} and 𝒴m\mathcal{Y}\subset\mathbb{R}^{m}, 𝒳×𝒴\mathcal{X}\times\mathcal{Y} is their Cartesian product; 𝒳+n(r)\mathcal{X}+\mathcal{B}_{n}(r) is the open set defined as 𝒳+n(r)={x+y:x𝒳,yn(r)}\mathcal{X}+\mathcal{B}_{n}(r)=\{x+y:x\in\mathcal{X},y\in\mathcal{B}_{n}(r)\}. Π𝒰\Pi_{\mathcal{U}} is the Euclidean projection of znz\in\mathbb{R}^{n} onto 𝒰n\mathcal{U}\subset\mathbb{R}^{n}; or, Π𝒰(z):=argminu𝒰uz2\Pi_{\mathcal{U}}(z):=\operatorname{arg}\min_{u\in\mathcal{U}}\|u-z\|^{2}. A function γ:00\gamma:\mathbb{R}_{\geq 0}\rightarrow\mathbb{R}_{\geq 0} is of class 𝒦\mathcal{K} if it is continuous, γ(0)=0\gamma(0)=0, and strictly increasing; it is of class 𝒦\mathcal{K}_{\infty} if it is additionally unbounded. A function β:0×00\beta:\mathbb{R}_{\geq 0}\times\mathbb{R}_{\geq 0}\to\mathbb{R}_{\geq 0} is of class 𝒦\mathcal{KL} if for each fixed tt the function β(r,t)\beta(r,t) is of class 𝒦\mathcal{K}, and if for each fixed rr the function β(r,t)\beta(r,t) is decreasing with respect to tt and is s.t. β(r,t)0\beta(r,t)\rightarrow 0 for tt\rightarrow\infty. We consider systems that can be modeled using dynamics of the form:

x˙=f(x,u,w),x(0)=x0,\displaystyle\dot{x}=f(x,u,w),\,\,x(0)=x_{0}, (1)

where x:0𝒳nxx:\mathbb{R}_{\geq 0}\to\mathcal{X}\subseteq\mathbb{R}^{n_{x}} is the state, u:0𝒰nuu:\mathbb{R}_{\geq 0}\to\mathcal{U}\subseteq\mathbb{R}^{n_{u}} is the control input, w:0𝒲nww:\mathbb{R}_{\geq 0}\to\mathcal{W}\subseteq\mathbb{R}^{n_{w}} is a time-varying unknown exogenous disturbance, and where the vector field f:𝒳×𝒰×𝒲nxf:\mathcal{X}\times\mathcal{U}\times\mathcal{W}\to\mathbb{R}^{n_{x}} is continuously differentiable on the open and connected domain 𝒳×𝒰×𝒲\mathcal{X}\times\mathcal{U}\times\mathcal{W} and is Lipschitz-continuous in its arguments with constants Lx,LuL_{x},L_{u}, and LwL_{w} respectively. In this paper, motivated by practical hardware and operational requirements, we restrict our attention to cases where u𝒰cu\in{\mathcal{U}}_{\textup{c}} at all times, where 𝒰c𝒰{\mathcal{U}}_{\textup{c}}\subset\mathcal{U} is compact. We impose the following additional assumptions on the system (1).

Assumption 1

There exists a unique continuously differentiable map h:𝒰×𝒲𝒳h:\mathcal{U}\times\mathcal{W}\to\mathcal{X} such that, for any (constant) u¯𝒰\bar{u}\in\mathcal{U} and w¯𝒲\bar{w}\in\mathcal{W}, f(h(u¯,w¯),u¯,w¯)=0.f\left(h(\bar{u},\bar{w}),\bar{u},\bar{w}\right)=0. Moreover, h(u,w)h(u,w) admits the decomposition h(u,w)=hu(u)+hw(w),h(u,w)=h_{u}(u)+h_{w}(w), where huh_{u} and hwh_{w} are Lipschitz continuous with constants hu\ell_{h_{u}} and hw\ell_{h_{w}}, respectively. \square

Assumption 2

For all t0t\in\mathbb{R}_{\geq 0}, w(t)𝒲cw(t)\in{\mathcal{W}}_{\textup{c}}, where 𝒲c𝒲{\mathcal{W}}_{\textup{c}}\subset\mathcal{W} is compact. Moreover, tw(t)t\mapsto w(t) is continuous. \square

Assumption 1 guarantees that, for constant inputs u¯\bar{u} and w¯\bar{w}, system (1) admits the unique equilibrium point x¯:=h(u¯,w¯)\bar{x}:=h(\bar{u},\bar{w}). Note that when xf(x,u¯,w¯)\nabla_{x}f(x,\bar{u},\bar{w}) is invertible for any u¯\bar{u} and w¯\bar{w}, then the existence of h(u¯,w¯)h(\bar{u},\bar{w}) is always guaranteed. Furthermore, by the implicit function theorem, h(u¯,w¯)h(\bar{u},\bar{w}) is differentiable since f(x,u¯,w¯)f(x,\bar{u},\bar{w}) is differentiable. We also note that the equilibrium set 𝒳eq:={h(u¯,w¯):u¯𝒰c,w¯𝒲c}\mathcal{X}_{eq}:=\{h(\bar{u},\bar{w}):\bar{u}\in\mathcal{U}_{c},\,\,\bar{w}\in\mathcal{W}_{c}\} is compact; this is due to 𝒰c×𝒲c\mathcal{U}_{c}\times\mathcal{W}_{c} being compact, h(u,w)h(u,w) being continuously differentiable, and the result (Rudin, 1976, Thm. 4.14). For any u𝒰cu\in\mathcal{U}_{c}, we have that uh(u,w¯)hu\|\nabla_{u}h(u,\bar{w})\|\leq\ell_{h_{u}} since 𝒰c\mathcal{U}_{c} is compact Rudin (1976).

Next, as customary in the context of feedback optimization (see, e.g., Colombino et al. (2020); Hauswirth et al. (2021), we require a stability condition on the system to control. To this end, let 𝒳r=𝒳eq+r(r)𝒳\mathcal{X}_{r}=\mathcal{X}_{eq}+\mathcal{B}_{r}(r)\subseteq\mathcal{X}, r>0r>0, be a set for which the following assumption holds.

Assumption 3

There exists a continuously differentiable function V:𝒳r×𝒰×𝒲V:\mathcal{X}_{r}\times\mathcal{U}\times\mathcal{W}\to\mathbb{R} with constants d1,d2,d3>0d_{1},d_{2},d_{3}>0 and a 𝒦\mathcal{K}-function σw\sigma_{w} such that:

  1. 1.

    For all x𝒳x\in\mathcal{X}, u𝒰u\in\mathcal{U}, and w𝒲w\in\mathcal{W},

    d1x~2V(x~,u,w)d2x~2,\quad\quad\quad d_{1}\|\tilde{x}\|^{2}\leq V(\tilde{x},u,w)\leq d_{2}\|\tilde{x}\|^{2},

    where x~:=xh(u,w)\tilde{x}:=x-h(u,w);

  2. 2.

    For any constant u𝒰u\in\mathcal{U},

    V˙(x(t),u,w(t))d3V(x(t),u,w(t))+σw(w˙(t)).\dot{V}(x(t),u,w(t))\leq-d_{3}V(x(t),u,w(t))+\sigma_{w}(\|\dot{w}(t)\|).\hfill\square

From Khalil (2002), Assumption 3 implies that there exists constants k,a,γ>0k,a,\gamma>0 such that the following holds:

x~(t)kx~(0)eat+γσw(sup0τt(w˙(τ))),\displaystyle\|\tilde{x}(t)\|\leq k\|\tilde{x}(0)\|e^{-at}+\gamma\sigma_{w}\left(\sup_{0\leq\tau\leq t}(\|\dot{w}(\tau)\|)\right),

for some constant γ>0\gamma>0, and for x(0)𝒳0:=𝒳eq+𝔹n(r0),r0<(rdiam(𝒳eq))/kγ¯x(0)\in\mathcal{X}_{0}:=\mathcal{X}_{eq}+\mathbb{B}_{n}(r_{0}),\,\,r_{0}<(r-\text{diam}({\mathcal{X}}_{\textup{eq}}))/k-\bar{\gamma}, γ¯:=γσw(supt0w˙(t)))\bar{\gamma}:=\gamma\sigma_{w}\left(\sup_{t\geq 0}\|\dot{w}(t)\|)\right).

We point out that when the (physical) system does not satisfy Assumption 3, then (1) models the pre-stabilized physical system.

2.1 Generative and Perception Maps

In this paper, we assume that the state xx of (1) is not directly measurable. Instead, one has access to nonlinear and possibly high-dimensional observations of the state ζ=q(x)\zeta=q(x), where q:𝒳nζq:\mathcal{X}\rightarrow\mathbb{R}^{n_{\zeta}} is an unknown generative map. This setup emerges when information about the state is acquired through perceptual information from sensing and estimation mechanisms. For example, in applications in autonomous driving, vehicle states are often reconstructed from images generated by cameras. See, for example, the models in Dean and Recht (2021); Al Makdah et al. (2020); Murillo-González and Poveda (2022); also see the closely related observer design problems in Marchi et al. (2022); Chou et al. (2022).

Regarding the unknown map xq(x)x\mapsto q(x), we make the following assumption (see also Dean and Recht (2021)).

Assumption 4

The map q:𝒳nζq:\mathcal{X}\to\mathbb{R}^{n_{\zeta}} is such that the image of q(𝒳)q(\mathcal{X}^{\prime}) is compact for any compact set 𝒳𝒳\mathcal{X}^{\prime}\subset\mathcal{X}. Further, there exists a map p:nζnxp:\mathbb{R}^{n_{\zeta}}\to\mathbb{R}^{n_{x}} such that p(ζ)=p(q(x))=x+ε(x)p(\zeta)=p(q(x))=x+\varepsilon(x), where where ε(x)\varepsilon(x) is bounded as ε(x)ε¯\|\varepsilon(x)\|\leq\bar{\varepsilon} for any x𝒳x\in\mathcal{X}, for a given finite ε¯0\bar{\varepsilon}\geq 0. \square

The function p(ζ)p(\zeta) in Assumption 4 is referred to as the perception map; for a given observation ζ\zeta, it yields a possibly noisy estimate of the state, up to a bounded error ε(x)\varepsilon(x). In this paper, we will leverage supervised learning methods to estimate the perception map p(ζ)p(\zeta) from data.

2.2 Regulation to Solutions of an Optimization Problem

We focus on regulating the system (1) to the solution of the following time-varying optimization problem:

(u(t),x(t))argminu¯𝒰c\displaystyle(u^{*}(t),x^{*}(t))\in\arg\underset{\bar{u}\in\mathcal{U}_{c}}{\min}~{}~{} ϕ(u¯,t)+ψ(x¯,t)\displaystyle\phi(\bar{u},t)+\psi(\bar{x},t) (2a)
s.t. x¯=h(u¯,w(t)),\displaystyle\bar{x}=h(\bar{u},w(t)), (2b)

where uϕ(u,t)u\to\phi(u,t) and xψ(x,t)x\to\psi(x,t) are functions that describe costs associated with the system’s inputs and states, respectively. We remark that (2) is a time-varying optimization problem for two reasons: (i) the cost functions are time-varying, which allow us to account for performance and safety objectives that evolve over time, and (ii) the constraint is time-varying since the system’s equilibrium point is parametrized by the time-varying signal w(t)w(t). Accordingly, (2) defines optimal trajectories t(u(t),x(t))t\mapsto(u^{*}(t),x^{*}(t)) for the system (1). Note that, since h(u,w)h(u,w) is unique for any fixed uu and ww (see Assumption 1), the optimization problem (2) can be rewritten as:

u(t)argminu¯𝒰cϕ(u¯,t)+ψ(h(u¯,w(t)),t).\displaystyle u^{*}(t)\in\arg\underset{\bar{u}\in\mathcal{U}_{c}}{\min}\,\phi(\bar{u},t)+\psi\left(h(\bar{u},w(t)),t\right). (3)

Given the problem (3) we formalize our control problem.

Problem 1 (Online optimization with state perception)

Design an output feedback controller so that the inputs and states of (1) track the time-varying solution (u(t),x(t))(u^{*}(t),x^{*}(t)) of (3) when w(t)w(t) in unknown and xx is not measurable; instead, we have access only to state estimates x^=p^(ζ)\hat{x}=\hat{p}(\zeta) at certain instants 𝕊={kτ:k+}\mathbb{S}=\{k\tau:k\in\mathbb{Z}_{+}\}, τ>0,\tau>0, where ζ=q(x)\zeta=q(x) and p^()\hat{p}(\cdot) is an estimate of the map p()p(\cdot). \square

Remark 1

(Implicit solution) Since the problem (3) is parametrized by the unknown exogenous inputs w(t)w(t), the solutions of (3) cannot be computed explicitly via standard numerical optimization methods. We seek feedback controllers that drive inputs and states of (1) to solutions of (3) by relying only on estimates of the state x^=p^(ζ)\hat{x}=\hat{p}(\zeta), and without requiring sensing of the disturbance w(t)w(t) . \square

Remark 2

(Interpretation of the control problem) Recall that (3) is parametrized by a time-varying disturbance wtw_{t} and has time-varying costs. Thus, (3) formalizes an equilibrium seeking problem, where the objective is to select optimal input-state pairs (u(t),x(t))(u^{*}(t),x^{*}(t)) that minimizes the specified time-varying cost at each time tt (see, e.g., Colombino et al. (2020); Bianchin et al. (2021); Belgioioso et al. (2021)). This is a high-level regulation problem that can be nested with a stabilizing controller. \square

We conclude this section with some relevant assumptions.

Assumption 5

The following hold:

5(i) The function uϕ(u,t)u\mapsto\nabla\phi(u,t) is u\ell_{u}-Lipschitz continuous for all u𝒰u\in\mathcal{U}, u0\ell_{u}\geq 0, for all tt.

5(ii) The function xψ(x,t)x\mapsto\nabla\psi(x,t) is x\ell_{x}-Lipschitz continuous for all x𝒳x\in\mathcal{X}, x0\ell_{x}\geq 0, for all tt.

5(iii) The function uϕ(u,t)+uh(u,w)ψ(h(u,w),t)u\mapsto\nabla\phi(u,t)+\partial_{u}h(u,w)^{\top}\nabla\psi(h(u,w),t) is μ\mu-strongly convex with μ>0\mu>0, for all u𝒰u\in\mathcal{U} and for all tt, where uh(u,w)\partial_{u}h(u,w) is the Jacobian of h(u,w)h(u,w) w.r.t. uu.

5(iv) The set 𝒰cnu\mathcal{U}_{c}\subset\mathbb{R}^{n_{u}} is convex. \square

Note that, from Assumption 5, it follows that the composite cost uψ(u)+uh(u,w)ψ(h(u,w))u\mapsto\nabla\psi(u)+\partial_{u}h(u,w)^{\top}\nabla\psi(h(u,w)) is \ell-Lipschitz continuous with constant :=u+hu2x\ell:=\ell_{u}+\ell_{h_{u}}^{2}\ell_{x}.

3 Perception-Based System Regulation

To address Problem 1, we consider the design of feedback controllers that are inspired by projected-gradient-type methods as in Bianchin et al. (2021); Cothren et al. (2022a). However, to acknowledge the fact that the state estimates are available only at given time intervals (for example, images from a camera are captured at a given frequency), our controller design is based on a sampled data mechanism. The controller is equipped with a supervised learning method to estimate the perception map.

Towards this, let τ>0\tau>0 represent the period between two consecutive arrivals of perceptual data (for example, images) and let k+k\in\mathbb{Z}_{+} be the sampling index, so that 𝕊={kτ:k+}\mathbb{S}=\{k\tau:k\in\mathbb{Z}_{+}\} is the set of times where perceptual information arrives and control inputs are updated. Accordingly, denote as xk=x(kτ)x_{k}=x(k\tau) and wk=w(kτ)w_{k}=w(k\tau) the sampled states and disturbance at time kτk\tau, and let ϕk(u):=ϕ(u,kτ)\phi_{k}(u):=\phi(u,k\tau) and ψk(x):=ψ(x,kτ)\psi_{k}(x):=\psi(x,k\tau) for notational brevity. We propose the following projected-gradient-type controller to generate inputs {uk}\{u_{k}\} at each time kτk\tau, k+k\in\mathbb{Z}_{+}:

uk=Π𝒰c{uk1ηΨk(uk1,xk)},\displaystyle u_{k}=\Pi_{\mathcal{U}_{c}}\left\{u_{k-1}-\eta\Psi_{k}(u_{k-1},x_{k})\right\}, (4)

where η>0\eta>0 is a tunable parameter (also known as step size in the gradient descent literature), and

Ψk(u,x):=ϕk(u)+H(u)ψk(x),\Psi_{k}(u,x):=\nabla\phi_{k}(u)+H(u)^{\top}\nabla\psi_{k}(x),

where H(u)H(u) is the Jacobian of huh_{u} evaluated at uu.

The controller (4) is of the form of a projected gradient-type algorithm; here, we have modified it by including the gradient evaluated at the instantaneous system state xkx_{k} to circumvent the need to measure the exogenous input wkw_{k}. We also note that the map Ψk\Psi_{k} is applied to the current state xkx_{k}, and the previous control input uk1u_{k-1}, which is applied to the system over the interval [(k1)τ,kτ)[(k-1)\tau,k\tau). Critically, the controller relies on the knowledge of the system state x(t)x(t) at time τk\tau k, which cannot be observed directly. To address this, we consider training a neural network to obtain an estimate p^\hat{p} of the perception map pp (as in Assumption 4) which gives estimates of state x(t)x(t) from the perceptual data ζ(t)\zeta(t).

Towards this, we consider a set of training points {x(i),ζ(i)=q(x(i))}i=1N\{x^{(i)},\zeta^{(i)}=q(x^{(i)})\}_{i=1}^{N} to guarantee that the network training is well-posed as stated next.

Assumption 6

The training points {x(i)}i=1N\{x^{(i)}\}_{i=1}^{N} are drawn from the compact set 𝒳tr:=𝒳eq+n[rtr]\mathcal{X}_{\text{tr}}:=\mathcal{X}_{eq}+\mathcal{B}_{n}[r_{\text{tr}}], r0rtrrr_{0}\leq r_{\text{tr}}\leq r. \square

Hereafter, let 𝒬:=q(𝒳tr)\mathcal{Q}:=q(\mathcal{X}_{\text{tr}}) denote the perception set associated with the training set, which is a compact set. Assumption 6 allows us to leverage existing results on the bounds on the approximation error supζ𝒬p^(ζ)p(ζ)\sup_{\zeta\in\mathcal{Q}}\|\hat{p}(\zeta)-p(\zeta)\| of feedforward neural networks and residual neural networks over the compact set 𝒬\mathcal{Q}; see Hornik et al. (1989) and Marchi et al. (2022); Tabuada and Gharesifard (2020).

With an estimate p^\hat{p} of the perception map pp obtained via a neural network, the proposed perception-based controller is shown in Figure 1 and is tabulated in Algorithm 1.

Algorithm 1 Regulation with NN State Perception

# Training

Given: training set {(x(i),ζ(i))}i=1N\{(x^{(i)},\zeta^{(i)})\}_{i=1}^{N}

Obtain: p^NNlearning({(x(i),ζ(i))}i=1N)\hat{p}\leftarrow\operatorname{NN-learning}(\{(x^{(i)},\zeta^{(i))}\}_{i=1}^{N})

# Gradient-based Sampled-Data Feedback Control

Given: set 𝒰c\mathcal{U}_{c}, funct.s ϕ(,t),ψ(,t),H(u)\phi(\cdot,t),\psi(\cdot,t),H(u), p^\hat{p}, gain η\eta

Initial conditions: x(0)𝒳0x(0)\in\mathcal{X}_{0}, u(0)𝒰cu(0)\in\mathcal{U}_{c}

For t0t\geq 0, k+k\in\mathbb{Z}_{+}:

x˙(t)\displaystyle\dot{x}(t) =f(x(t),u(t),w(t))\displaystyle=f(x(t),u(t),w(t)) (5a)
ζk\displaystyle\zeta_{k} =q(x(kτ))\displaystyle=q(x(k\tau)) (5b)
uk\displaystyle u_{k} =Π𝒰c{uk1ηΨk(uk1,p^(ζk))}\displaystyle=\Pi_{\mathcal{U}_{c}}\left\{u_{k-1}-\eta\Psi_{k}(u_{k-1},\hat{p}(\zeta_{k}))\right\} (5c)
u(t)\displaystyle u(t) =uk,t[kτ,(k+1)τ)\displaystyle=u_{k},\,\,\,\,t\in[k\tau,(k+1)\tau) (5d)
Refer to caption
Figure 1: Block diagram of the proposed perception-based feedback controller in closed loop with the plant.

In the training phase, the operation NNlearning()\operatorname{NN-learning}(\cdot) refers to a generic training procedure for the neural network via empirical risk minimization, which results in the approximate map p^()\hat{p}(\cdot). In the proposed controller, p^()\hat{p}(\cdot) is utilized to obtain estimates of the state of the dynamical system x^k=p^(ζk)\hat{x}_{k}=\hat{p}(\zeta_{k}), which is subsequently utilized to compute the gradient map Ψk(u,p^(ζ))\Psi_{k}(u,\hat{p}(\zeta)). Note that, as in sampled-data systems, the input u(t)u(t) is computed based on the control iterates {uk}\{u_{k}\} as the piece-wise constant signal u(t)=uk,t[kτ,(k+1)τ)u(t)=u_{k},t\in[k\tau,(k+1)\tau), k+k\in\mathbb{Z}_{+}.

4 Stability and Tracking Analysis

To analyze the performance of the closed-loop system (5), recall that (uk,xk)(u^{*}_{k},x^{*}_{k}) is the sequence of optimizers of the time-varying problem (2) at the times in 𝕊\mathbb{S}. As proposed in Belgioioso et al. (2021), we consider a discrete-time counterpart of (5); sampling (5) at times in 𝕊\mathbb{S} yields:

xk+1\displaystyle x_{k+1} =F(xk,uk,wk),\displaystyle=F(x_{k},u_{k},w_{k}), (6a)
ζk\displaystyle\zeta_{k} =q(xk),\displaystyle=q(x_{k}), (6b)
uk\displaystyle u_{k} =Π𝒰c{uk1ηΨk(uk1,p^(ζk))},\displaystyle=\Pi_{\mathcal{U}_{c}}\left\{u_{k-1}-\eta\Psi_{k}(u_{k-1},\hat{p}(\zeta_{k}))\right\}, (6c)

where F(X0,v,w)F(X_{0},v,w) denotes the solution of the initial value problem X˙(t)=f(X(t),v,w(t))\dot{X}(t)=f(X(t),v,w(t)), with X(t0)=X0𝒳rX(t_{0})=X_{0}\in\mathcal{X}_{r} at time t=t0+τt=t_{0}+\tau, where τ\tau is the sampling period. The tracking results for (6) will then translate into transient bounds for the sampled-data system (5) by using (Nešić et al., 1999, Theorem 5).

Let zk:=(xkxk,ukuk)z_{k}:=(x_{k}-x^{*}_{k},u_{k}-u^{*}_{k}) and define the matrix M12×2M_{1}\in\mathbb{R}^{2\times 2} as,

M1:=[cPηxhucw/d1cwhxd1(1+cP)cw(1+cwηxhx2)],\displaystyle M_{1}:=\begin{bmatrix}c_{P}&\eta\ell_{x}\ell_{h_{u}}c_{w}/\sqrt{d_{1}}\\ c_{w}\ell_{h_{x}}\sqrt{d_{1}}(1+c_{P})&c_{w}\left(1+c_{w}\eta\ell_{x}\ell_{h_{x}}^{2}\right)\end{bmatrix}, (7)

where cw:=ed3τ/2d2/d1c_{w}:=e^{-d_{3}\tau/2}\sqrt{d_{2}/d_{1}}, d1>0d_{1}>0 is given in Assumption 3, and cP:=1η(2μη2)c_{P}:=\sqrt{1-\eta(2\mu-\eta\ell^{2})}; we further define:

M2\displaystyle M_{2} :=[1ηxhucwhud1(cP+1)cwd1huηxhu],\displaystyle:=\begin{bmatrix}1&\eta\ell_{x}\ell_{h_{u}}\\ c_{w}\ell_{h_{u}}\sqrt{d_{1}}(c_{P}+1)&c_{w}\sqrt{d_{1}}\ell_{h_{u}}\eta\ell_{x}\ell_{h_{u}}\end{bmatrix}, (8)
M3\displaystyle M_{3} :=[cwηxhu2(τ+1)ηxhud1τ,].\displaystyle:=\begin{bmatrix}c_{w}\eta\ell_{x}\ell_{h_{u}}^{2}(\sqrt{\tau}+1)\\ \frac{\eta\ell_{x}\ell_{h_{u}}}{\sqrt{d_{1}}}\sqrt{\tau},\end{bmatrix}\,. (9)

With these definitions, our main result provides transient bounds for the system (6).

Theorem 1 (Transient bound for (6))

Consider the closed-loop system (6). Let Assumptions 1-5 hold, and assume that

τ>1d3log(d2d1),η(0,2μ2).\displaystyle\tau>\frac{1}{d_{3}}\log\left(\frac{d_{2}}{d_{1}}\right),\quad\eta\in\left(0,\frac{2\mu}{\ell^{2}}\right). (10)

Then, the tracking error satisfies

zk\displaystyle\|z_{k}\| rM1m2m1cM1k+1z0+bM3σw(sup0sτkw˙(s))\displaystyle\leq\frac{r_{M_{1}}m_{2}}{m_{1}}c_{M_{1}}^{k+1}\|z_{0}\|+b\|M_{3}\|\sigma_{w}\left(\sup_{0\leq s\leq\tau k}\|\dot{w}(s)\|\right)
+bM2||[sup1skusus1supζ𝒬p^(ζ)p(ζ)+ε¯]||,\displaystyle~{}~{}~{}~{}~{}~{}+b\|M_{2}\|\left\lvert\left\lvert\begin{bmatrix}\sup_{1\leq s\leq k}\|u_{s}^{*}-u_{s-1}^{*}\|\\ \sup_{\zeta\in\mathcal{Q}}\|\hat{p}(\zeta)-p(\zeta)\|+\bar{\varepsilon}\end{bmatrix}\right\lvert\right\lvert, (11)

for any u(0)𝒰c,x(0)𝒳eq+𝔹n(rI)u(0)\in\mathcal{U}_{c},x(0)\in\mathcal{X}_{eq}+\mathbb{B}_{n}(r_{I}) for a sufficiently small 0<rI<r00<r_{I}<r_{0}, where m1:=min{1,d1},m2:=max{1,d2}m_{1}:=\min\left\{1,\sqrt{d_{1}}\right\},m_{2}:=\max\left\{1,\sqrt{d_{2}}\right\}, b:=rM1cM1m1(1+cM1)b:=\frac{r_{M_{1}}c_{M_{1}}}{m_{1}(1+c_{M_{1}})}, and the constants rM1>0r_{M_{1}}>0 and cM1[0,1)c_{M_{1}}\in[0,1) are s.t. M1krM1cM1k,\|M_{1}^{k}\|\leq r_{M_{1}}c_{M_{1}}^{k}, k+\forall k\in\mathbb{Z}_{+}. \square

Theorem 1 guarantees exponential convergence of the sampled trajectory of the tracking error to a neighborhood of zero. The size of the neighborhood depends on: supζ𝒬p^(ζ)p(ζ)+ε¯\sup_{\zeta\in\mathcal{Q}}\|\hat{p}(\zeta)-p(\zeta)\|+\bar{\varepsilon}, which corresponds to the error associated with the state estimation, sup1skusus1\sup_{1\leq s\leq k}\|u_{s}^{*}-u_{s-1}^{*}\|, which captures the time-variability of the optimizer, and σw(sup0sτkw˙(s))\sigma_{w}(\sup_{0\leq s\leq\tau k}\|\dot{w}(s)\|) corresponds to the time-variability of the unknown disturbance. The proof is provided in the extended version of the paper Cothren et al. (2022b).

The transient bound (11) shows that the sampled system (6) is input-to-state stable (ISS), in the sense of Jiang and Wang (2001), with respect to w˙\|\dot{w}\|, the drift on the optimal solution ukuk1\|u_{k}^{*}-u_{k-1}^{*}\|, and the error introduced by the estimated perception map. To translate the results of Theorem 1 into transient bounds for the continuous system (5), we leverage Theorem 5 of Nešić et al. (1999). In particular, let z(t):=(x(t)x(t),uku(t))z(t):=(x(t)-x^{*}(t),u_{k}-u^{*}(t)), where u(t)u^{*}(t) is piece-wise constant and such that u(t)=uku^{*}(t)=u^{*}_{k} for t[kτ,(k+1)τ)t\in[k\tau,(k+1)\tau), and x(t)x^{*}(t) is defined similarly. Then, we obtain the following.

Theorem 2 (Transient bound for (5))

Consider the closed-loop system (5). Let Assumptions 1-5 hold, and assume that τ,η\tau,\eta satisfy (10). Then, there exist β𝒦\beta\in\mathcal{KL} and γw,γu,γp𝒦\gamma_{w},\gamma_{u},\gamma_{p}\in\mathcal{K}_{\infty} such that

z(t)\displaystyle\|z(t)\| β(z0,t)+γw(sup0stw˙(s))+γu(Δu)\displaystyle\leq\beta(\|z_{0}\|,t)+\gamma_{w}\left(\sup_{0\leq s\leq t}\|\dot{w}(s)\|\right)+\gamma_{u}\left(\Delta_{u}^{*}\right)
+γp(supζ𝒬p^(ζ)p(ζ)+ε¯)\displaystyle~{}~{}~{}~{}~{}~{}+\gamma_{p}\left(\sup_{\zeta\in\mathcal{Q}}\|\hat{p}(\zeta)-p(\zeta)\|+\bar{\varepsilon}\right) (12)

holds for any u(0)𝒰c,x(0)𝒳eq+𝔹n(rI)u(0)\in\mathcal{U}_{c},x(0)\in\mathcal{X}_{eq}+\mathbb{B}_{n}(r_{I}) for a sufficiently small 0<rI<r00<r_{I}<r_{0}, given that Δu:=supk+ukuk1ru\Delta_{u}^{*}:=\sup_{k\in\mathbb{Z}_{+}}\|u^{*}_{k}-u^{*}_{k-1}\|\leq r_{u}, sup0stw˙(s)rw\sup_{0\leq s\leq t}\|\dot{w}(s)\|\leq r_{w}, and supζ𝒬p^(ζ)p(ζ)+ε¯rp\sup_{\zeta\in\mathcal{Q}}\|\hat{p}(\zeta)-p(\zeta)\|+\bar{\varepsilon}\leq r_{p} for some finite rw,ru,rp>0r_{w},r_{u},r_{p}>0. \square

Mirroring (11), the bound (12) shows that the system (5) is ISS with respect to w˙\|\dot{w}\|, the drift on the optimal solution Δu\Delta_{u}^{*}, and the error introduced by the neural network.

To provide a connection with the existing literature, we point out that (12) generalizes the following sub-cases: (i) when the state x(t)x(t) can be observed (without errors), then (12) reduces to a bound similar to Bianchin et al. (2021) (where, however, the controller is a continuous-time gradient flow); (ii) when the state x(t)x(t) can be observed and the functions ϕ(u)\phi(u) and ψ(x)\psi(x) are time invariant, then (12) boils down to the ISS result of Belgioioso et al. (2021) (for sampled-data controllers) and Colombino et al. (2020) (for continuous-time gradient flows) and Zheng et al. (2020).

Remark 3

When a residual network is utilized to estimate the map pp, the error supζ𝒬p^(ζ)p(ζ)\sup_{\zeta\in\mathcal{Q}}\|\hat{p}(\zeta)-p(\zeta)\| on the compact set 𝒬\mathcal{Q} in (12) can be bounded as shown in Marchi et al. (2022); Tabuada and Gharesifard (2020), under given conditions on the selection of the training points. The bound in Marchi et al. (2022); Tabuada and Gharesifard (2020) is particularly interesting because it ties the approximation error with the training error and the geometry of the residual network. We do not include a customization of the bound (12) using the results of Tabuada and Gharesifard (2020) due to space limitations. \square

Theorem 2 follows from the results of Theorem 1 and Theorem 5 of Nešić et al. (1999) by noticing that (5) is uniformly bounded over an interval τ\tau in the sense of (Nešić et al., 1999, Definition 2). This is because u(t)u(t) is piece-wise constant and takes values from the compact set 𝒰c\mathcal{U}_{c}, 𝒲c\mathcal{W}_{c} is compact (and w˙\|\dot{w}\| is bounded), and the perception error is bounded; boundedness of x(t)x(t) w.r.t. the sampled sequence {xk}\{x_{k}\} follows from (Belgioioso et al., 2021, Lemma 1). The proof is omitted due to space limitations.

5 Application to Autonomous Driving

We utilize our controller in Algorithm 1 to control a vehicle to track a set of reference points while avoiding obstacles; the position of the vehicle is accessible only through camera images. The vehicle’s movement is modeled by the unicycle dynamics with state x=(a,b,θ)x=(a,b,\theta), where r:=(a,b)2r:=(a,b)\in\mathbb{R}^{2} is the position in the 2D plane and θ(π,π]\theta\in(-\pi,\pi] is the orientation with respect to the aa-axis: a˙=vcos(θ)\dot{a}=v\cos(\theta), b˙=vsin(θ),θ˙=w\dot{b}=v\sin(\theta),\,\,\dot{\theta}=w, where v,wv,w\in\mathbb{R} are controllable inputs. Importantly, we assume that we do not have direct knowledge of the state x=(a,b,θ)x=(a,b,\theta), but instead, camera images, represented by the generative map ζ=q(x)\zeta=q(x), which return the position r=(a,b)r=(a,b). We consider a lower-level stabilizer to stabilize the unicycle dynamics to satisfy Assumption 3. Following (Cothren et al., 2022a, Lemma 2), let u=(ua,ub)u=(u_{a},u_{b}) denote the control inputs for each direction aa and bb, and consider the change of coordinates to the error variables, ξ:=ux\xi:=\|u-x\|, ϕ:=atan(ubbuaa)θ\phi:=\operatorname{atan}\left(\frac{u_{b}-b}{u_{a}-a}\right)-\theta. The dynamics of these are ξ˙=vcos(ϕ)\dot{\xi}=-v\cos(\phi) and ϕ˙=vξsin(ϕ)w\dot{\phi}=\frac{v}{\xi}\sin(\phi)-w. By setting

v=κξcos(ϕ),w=κ(cos(ϕ)+1)sin(ϕ)+κϕ,\displaystyle v=\kappa\xi\cos(\phi),\,\,w=\kappa(\cos(\phi)+1)\sin(\phi)+\kappa\phi, (13)

for some κ>0\kappa>0, the unicycle dynamics admit a globally exponentially stable equilibrium point (ξ,ϕ)=(0,0)(\xi,\phi)=(0,0). By setting the constants v,wv,w\in\mathbb{R} as in (13), the dynamic plant satisfies Assumption 3.

We consider a sequence of locations that the vehicle would like to follow, denoted as 𝒯d:={rd,k,k+}\mathcal{T}_{d}:=\{r_{d,k}\in\mathcal{R},k\in\mathbb{Z}_{+}\}, where :={r=(a,b):x=(a,b,θ)𝒳}\mathcal{R}:=\{r=(a,b):x=(a,b,\theta)\in\mathcal{X}\}. To avoid obstacles around the vehicle, we consider building at each time τk\tau k, k+k\in\mathbb{Z}_{+}, the free workspace of the vehicle defined as k(rk):={r:ak(i)(rk)rb(i)(rk)0,i=1,,Mk}\mathcal{F}_{k}(r_{k}):=\{r:a_{k}^{(i)}(r_{k})^{\top}r-b^{(i)}(r_{k})\geq 0,i=1,\dots,M_{k}\}, where MkM_{k} is the number of obstacles at time τk\tau k, r(i)r^{(i)} is the center position of the iith obstacle, ak(i)(rk)=r(i)rka_{k}^{(i)}(r_{k})=r^{(i)}-r_{k}, b(i)(rk)b^{(i)}(r_{k}) is a scalar computed depending on vehicle and obstacles positions, and rkr_{k}\in\mathcal{R} is the position at time τk\tau k. The free workspace (rk)\mathcal{F}(r_{k}) describes a local neighborhood of the vehicle that is guaranteed to be free of obstacles.

To track the desired trajectory 𝒯d\mathcal{T}_{d} while avoiding obstacles, we utilize the following waypoint-tracking formulation with a barrier function

ψk(r):=12rrd,k2λki=1Mlog(b(i)(rk)a(i)(rk)x),\displaystyle\psi_{k}(r):=\frac{1}{2}\|r-r_{d,k}\|^{2}-\lambda_{k}\sum_{i=1}^{M}\log(b^{(i)}(r_{k})-a^{(i)}(r_{k})^{\top}x),

where λk>0\lambda_{k}>0 is a tuning parameter; in particular, in the simulations we set λk=1/exp(0.1k)\lambda_{k}=1/\exp(0.1k). We set ϕk(u)=0\phi_{k}(u)=0.

Refer to caption
Figure 2: Sample trajectories of the vehicle tracking the way points represented by red stars. The magenta dashed path is the full execution of Algorithm 1 with perception, and the yellow path is with perfect state feedback. Red dashed lines identify obstacles used to construct k.\mathcal{F}_{k}.
Refer to caption
Figure 3: Error of the position r(t)r(t)\|r(t)-r^{*}(t)\|. Jumps in the error coincide with variations of the reference rd,kr_{d,k}.

We used a residual neural network to estimate the position of the vehicle from aerial images. specifically, the network returns estimated coordinates in the (a,b)(a,b). For training, we generate 94,50094,500 images of size 64×6464\times 64 pixels depicting a red bot in the roundabout setting in Figure 2. The images were built using the MATLAB Image Processing Toolbox and basic plotting functions therein by setting the background as an aerial view of a roundabout and plotting a red square for the vehicle. We used the resnet50 structure given in the MATLAB Deep Learning Toolbox and tailored the input (64×64×364\times 64\times 3 sized RGB images) and output (total number of labels) sizes to our specific case. For labels, we selected the pixels that corresponded to parts of the image containing the road so that the network only trains on data corresponding to allowable surfaces for the robot, which totaled to 135135 unique labels. Finally, we select five checkpoints along the road for the robot to follow (denoted by the red stars in Figure 2) during the execution of the algorithm corresponding to {rd,k}\{r_{d,k}\}.

Simulation results are given in Fig. 2 for the Algorithm 1. The dashed magenta line corresponds to the trajectory of the unicycle for the neural network-assisted controller, and the yellow line is for the controller with perfect state information. Fig. 3 shows the error r(t)r(t)\|r(t)-r^{*}(t)\| between the optimal and actual position of the vehicle. Importantly, the trajectory tracks the time-varying reference points within an error and avoids the obstacles.

6 Conclusion

We proposed an algorithm to regulate dynamical systems towards the solution of a convex optimization problem when we do not have full knowledge of the system states. Specifically, we developed a sampled-data feedback controller that is augmented with a neural network to estimate the state of the system from high-dimensional sensory data. Our results guaranteed exponential convergence of the interconnected system up to an error term dependent on the temporal variability of the problem and the error due to estimating the state from the neural network.

References

  • Agarwal et al. (2022) Agarwal, A., Simpson-Porco, J.W., and Pavel, L. (2022). Game-theoretic feedback-based optimization. IFAC-PapersOnLine, 55(13), 174–179.
  • Al Makdah et al. (2020) Al Makdah, A.A., Katewa, V., and Pasqualetti, F. (2020). Accuracy prevents robustness in perception-based control. In American Control Conference, 3940–3946.
  • Belgioioso et al. (2021) Belgioioso, G., Liao-McPherson, D., de Badyn, M.H., Bolognani, S., Lygeros, J., and Dorfler, F. (2021). Sampled-data online feedback equilibrium seeking: Stability and tracking. In IEEE Conference on Decision and Control.
  • Bianchin et al. (2021) Bianchin, G., Cortés, J., Poveda, J.I., and Dall’Anese, E. (2021). Time-varying optimization of LTI systems via projected primal-dual gradient flows. IEEE Trans. on Control of Networked Systems.
  • Bianchin et al. (2022) Bianchin, G., Poveda, J.I., and Dall’Anese, E. (2022). Online optimization of switched lti systems using continuous-time and hybrid accelerated gradient flows. Automatica, 146, 110579.
  • Brunner et al. (2012) Brunner, F.D., Dürr, H.B., and Ebenbauer, C. (2012). Feedback design for multi-agent systems: A saddle point approach. In IEEE Conference on Decision and Control, 3783–3789.
  • Chou et al. (2022) Chou, G., Ozay, N., and Berenson, D. (2022). Safe output feedback motion planning from images via learned perception modules and contraction theory. arXiv preprint arXiv:2206.06553.
  • Colombino et al. (2020) Colombino, M., Dall’Anese, E., and Bernstein, A. (2020). Online optimization as a feedback controller: Stability and tracking. IEEE Trans. On Control of Networked Systems, 7(1), 422–432.
  • Cothren et al. (2022a) Cothren, L., Bianchin, G., and Dall’Anese, E. (2022a). Online optimization of dynamical systems with deep learning perception. IEEE Open Journal of Control Systems, 1, 306–321.
  • Cothren et al. (2022b) Cothren, L., Bianchin, G., Dean, S., and Dall’Anese, E. (2022b). Perception-based sampled-data optimization of dynamical systems (longer version). arXiv:2211.10020.
  • Dawson et al. (2022) Dawson, C., Lowenkamp, B., Goff, D., and Fan, C. (2022). Learning safe, generalizable perception-based hybrid control with certificates. IEEE Robotics and Automation Letters, 7(2), 1904–1911.
  • Dean and Recht (2021) Dean, S. and Recht, B. (2021). Certainty equivalent perception-based control. In Learning for Dynamics and Control, 399–411. PMLR.
  • Hauswirth et al. (2021) Hauswirth, A., Bolognani, S., Hug, G., and Dörfler, F. (2021). Timescale separation in autonomous optimization. IEEE Trans. on Automatic Control, 66(2), 611–624.
  • Hirata et al. (2014) Hirata, K., Hespanha, J.P., and Uchida, K. (2014). Real-time pricing leading to optimal operation under distributed decision makings. In American Control Conf.
  • Hornik et al. (1989) Hornik, K., Stinchcombe, M., and White, H. (1989). Multilayer feedforward networks are universal approximators. Neural networks, 2(5), 359–366.
  • Jiang and Wang (2001) Jiang, Z.P. and Wang, Y. (2001). Input-to-state stability for discrete-time nonlinear systems. Automatica, 37(6), 857–869.
  • Jokic et al. (2009) Jokic, A., Lazar, M., and Van Den Bosch, P.P.J. (2009). On constrained steady-state regulation: Dynamic KKT controllers. IEEE Trans. on Automatic control, 54(9), 2250–2254.
  • Khalil (2002) Khalil, H.K. (2002). Nonlinear Systems; 3rd ed. Prentice-Hall, Upper Saddle River, NJ.
  • Lawrence et al. (2020) Lawrence, L.S., Simpson-Porco, J.W., and Mallada, E. (2020). Linear-convex optimal steady-state control. IEEE Transactions on Automatic Control, 66(11), 5377–5384.
  • Marchi et al. (2022) Marchi, M., Bunton, J., Gharesifard, B., and Tabuada, P. (2022). Safety and stability guarantees for control loops with deep learning perception. IEEE Control Systems Letters, 6, 1286–1291.
  • Murillo-González and Poveda (2022) Murillo-González, A. and Poveda, J.I. (2022). Data-assisted vision-based hybrid control for robust stabilization with obstacle avoidance via learning of perception maps. In 2022 American Control Conference (ACC), 886–892. IEEE.
  • Nešić et al. (1999) Nešić, D., Teel, A.R., and Sontag, E.D. (1999). Formulas relating kl stability estimates of discrete-time and sampled-data nonlinear systems. Systems & Control Letters, 38(1), 49–60.
  • Rudin (1976) Rudin, W. (1976). Principles of mathematical analysis, volume 3. McGraw-hill New York.
  • Simpson-Porco (2021) Simpson-Porco, J.W. (2021). Low-gain stability of projected integral control for input-constrained discrete-time nonlinear systems. IEEE Control Systems Letters, 6, 788–793.
  • Tabuada and Gharesifard (2020) Tabuada, P. and Gharesifard, B. (2020). Universal approximation power of deep residual neural networks via nonlinear control theory. arXiv preprint arXiv:2007.06007.
  • Xu et al. (2021) Xu, J., Lee, B., Matni, N., and Jayaraman, D. (2021). How are learned perception-based controllers impacted by the limits of robust control? In Learning for Dynamics and Control, 954–966.
  • Zheng et al. (2020) Zheng, T., Simpson-Porco, J., and Mallada, E. (2020). Implicit trajectory planning for feedback linearizable systems: A time-varying optimization approach. In American Control Conference, 4677–4682.

Appendix A Proofs

Proof of Theorem 1.

The proof of the Theorem 1 is divided into two main parts. In the first part, we construct intermediate bounds to be used in the second part of the proof.

(Part 1.a: Lyapunov Function) The first is to leverage the results in Lemma 1 of Belgioioso et al. (2021). By Assumption 3, we have that for a fixed u𝒰u\in\mathcal{U},

V˙(t)d3V(t)+σw(w˙).\displaystyle\dot{V}(t)\leq-d_{3}V(t)+\sigma_{w}(\|\dot{w}\|). (14)

Given the initial condition V(t0)=V0V(t_{0})=V_{0}, (14) implies that:

V(x(t),u,wt)V0ed3(tt0)+t0tσw(w˙(s))ds\displaystyle\begin{split}V(x(t)&,u,w_{t})\leq V_{0}e^{-d_{3}(t-t_{0})}+\int_{t_{0}}^{t}\sigma_{w}(\|\dot{w}(s)\|)ds\end{split} (15)
V0ed3(tt0)+σw(w¯),\displaystyle\leq V_{0}e^{-d_{3}(t-t_{0})}+\sigma_{w}(\bar{w}), (16)

where w¯:=sups[t0,t]w˙(s)\bar{w}:=\sup_{s\in[t_{0},t]}\|\dot{w}(s)\|. Define Vk:=V(xk,uk,wk)V_{k}:=V(x_{k},u_{k},w_{k}) and Wk:=VkW_{k}:=\sqrt{V_{k}}. Then, from Lemma 1 of Belgioioso et al. (2021), it follows that

aWk+1cwWk+cwhud1uk+1uk+τσw(w¯),\displaystyle aW_{k+1}\leq c_{w}W_{k}+c_{w}\ell_{h_{u}}\sqrt{d_{1}}\|u_{k+1}-u_{k}\|+\sqrt{\tau}\sigma_{w}^{\prime}(\bar{w}),

where cw:=ed3τ/2d2/d1c_{w}:=e^{-d_{3}\tau/2}\sqrt{d_{2}/d_{1}}, hu\ell_{h_{u}} is the Lipschitz constant of the steady state map w.r.t. uu, and σw:=σw𝒦\sigma_{w}^{\prime}:=\sqrt{\sigma_{w}}\in\mathcal{K}.

(Part 1.b: Bound for uk+1uk\|u_{k+1}-u_{k}\|) To simplify notation, rewrite the controller as uk:=Tk(uk1,x^k)u_{k}:=T_{k}(u_{k-1},\hat{x}_{k}), where Tk(u,x)=Π𝒰c{uηΨk(u,x)}T_{k}(u,x)=\Pi_{\mathcal{U}_{c}}\left\{u-\eta\Psi_{k}(u,x)\right\}. Moreover, let ex,ke_{x,k} denote the error in the gradient introduced by the perception map. Using this notation, calculate:

uk+1uk=Tk+1(uk,x^k+1)uk\displaystyle\|u_{k+1}-u_{k}\|=\|T_{k+1}(u_{k},\hat{x}_{k+1})-u_{k}\|
=Tk+1(uk,x^k+1)Tk+1(uk+1,h(uk+1,wk+1))\displaystyle=\|T_{k+1}(u_{k},\hat{x}_{k+1})-T_{k+1}(u_{k+1}^{*},h(u_{k+1}^{*},w_{k+1}))
+uk+1uk+ukuk\displaystyle~{}~{}~{}+u_{k+1}^{*}-u_{k}^{*}+u_{k}^{*}-u_{k}\|
=Tk+1(uk,x^k+1)Tk+1(uk,h(uk,wk+1))\displaystyle=\|T_{k+1}(u_{k},\hat{x}_{k+1})-T_{k+1}(u_{k},h(u_{k},w_{k+1}))
+Tk+1(uk,h(uk,wk+1))Tk+1(uk+1,h(uk+1,wk+1))\displaystyle~{}~{}~{}+T_{k+1}(u_{k},h(u_{k},w_{k+1}))-T_{k+1}(u_{k+1}^{*},h(u_{k+1}^{*},w_{k+1}))
+uk+1uk+ukuk\displaystyle~{}~{}~{}+u_{k+1}^{*}-u_{k}^{*}+u_{k}^{*}-u_{k}\|
Tk+1(uk,x^k+1)Tk+1(uk,h(uk,wk+1))\displaystyle\leq\|T_{k+1}(u_{k},\hat{x}_{k+1})-T_{k+1}(u_{k},h(u_{k},w_{k+1}))\|
+Tk+1(uk,h(uk,wk+1))Tk+1(uk+1,h(uk+1,wk+1))\displaystyle~{}~{}~{}+\|T_{k+1}(u_{k},h(u_{k},w_{k+1}))-T_{k+1}(u_{k+1}^{*},h(u_{k+1}^{*},w_{k+1}))\|
+uk+1uk+ukuk\displaystyle~{}~{}~{}+\|u_{k+1}^{*}-u_{k}^{*}\|+\|u_{k}^{*}-u_{k}\|
ηxhuxk+1+ex,k+1h(uk,wk+1)\displaystyle\leq\eta\ell_{x}\ell_{h_{u}}\|x_{k+1}+e_{x,k+1}-h(u_{k},w_{k+1})\|
+Tk+1(uk,h(uk,wk+1))Tk+1(uk+1,h(uk+1,wk+1))\displaystyle~{}~{}~{}+\|T_{k+1}(u_{k},h(u_{k},w_{k+1}))-T_{k+1}(u_{k+1}^{*},h(u_{k+1}^{*},w_{k+1}))\|
+uk+1uk+ukuk\displaystyle~{}~{}~{}+\|u_{k+1}^{*}-u_{k}^{*}\|+\|u_{k}^{*}-u_{k}\|
ηxhuxk+1h(uk,wk+1)+ηxhuex,k+1\displaystyle\leq\eta\ell_{x}\ell_{h_{u}}\|x_{k+1}-h(u_{k},w_{k+1})\|+\eta\ell_{x}\ell_{h_{u}}\|e_{x,k+1}\|
+Tk+1(uk,h(uk,wk+1))Tk+1(uk+1,h(uk+1,wk+1))\displaystyle~{}~{}~{}+\|T_{k+1}(u_{k},h(u_{k},w_{k+1}))-T_{k+1}(u_{k+1}^{*},h(u_{k+1}^{*},w_{k+1}))\|
+uk+1uk+ukuk.\displaystyle~{}~{}~{}+\|u_{k+1}^{*}-u_{k}^{*}\|+\|u_{k}^{*}-u_{k}\|.

Consider the first term, xk+1h(uk,wk+1)\|x_{k+1}-h(u_{k},w_{k+1})\|. We further bound this by applying Assumption 3 and by using the fact that W:=VW:=\sqrt{V}:

xk+1h(uk,wk+1)\displaystyle\|x_{k+1}-h(u_{k},w_{k+1})\| 1d1W(xk+1,uk,wk+1)\displaystyle\leq\frac{1}{\sqrt{d_{1}}}W(x_{k+1},u_{k},w_{k+1})
1d1(cwW(xk,uk,wk)+τσw(w¯)).\displaystyle\leq\frac{1}{\sqrt{d_{1}}}\left(c_{w}W(x_{k},u_{k},w_{k})+\sqrt{\tau}\sigma_{w}^{\prime}(\bar{w})\right).

To bound the second line of the final inequality, recall that

Tk+1\displaystyle\|T_{k+1} (u,h(u,wk+1))Tk+1(y,h(y,wk+1))cPuy\displaystyle(u,h(u,w_{k+1}))-T_{k+1}(y,h(y,w_{k+1}))\|\leq c_{P}\|u-y\|

holds for any u,y𝒰u,y\in\mathcal{U} due to Assumption 3. Thus,

Tk+1\displaystyle\|T_{k+1} (uk,h(uk,wk+1))Tk+1(uk+1,h(uk+1,wk+1))\displaystyle(u_{k},h(u_{k},w_{k+1}))-T_{k+1}(u_{k+1}^{*},h(u_{k+1}^{*},w_{k+1}))\|
cPukuk+1\displaystyle\leq c_{P}\|u_{k}-u_{k+1}^{*}\|
cP(uk+1uk+ukuk).\displaystyle\leq c_{P}\left(\|u_{k+1}^{*}-u_{k}^{*}\|+\|u_{k}^{*}-u_{k}\|\right).

In total,

uk+1ukηxhud1(cwW(xk,uk,wk)+τσw(w¯))+ηxhuex,k+1+(cP+1)(uk+1uk+ukuk).\displaystyle\begin{split}\|u_{k+1}&-u_{k}\|\leq\frac{\eta\ell_{x}\ell_{h_{u}}}{\sqrt{d_{1}}}\left(c_{w}W(x_{k},u_{k},w_{k})+\sqrt{\tau}\sigma_{w}^{\prime}(\bar{w})\right)\\ &~{}~{}~{}+\eta\ell_{x}\ell_{h_{u}}\|e_{x,k+1}\|\\ &~{}~{}~{}+(c_{P}+1)\left(\|u_{k+1}^{*}-u_{k}^{*}\|+\|u_{k}^{*}-u_{k}\|\right).\end{split} (17)

(Part 1.c: Bound for uk+1uk+1\|u_{k+1}-u_{k+1}^{*}\|) We obtain the following bound by using similar steps as in the earlier computation to bound uk+1uk.\|u_{k+1}-u_{k}\|. Calculate,

\displaystyle\| uk+1uk+1=Tk+1(uk,x^k+1)uk+1\displaystyle u_{k+1}-u_{k+1}^{*}\|=\|T_{k+1}(u_{k},\hat{x}_{k+1})-u_{k+1}^{*}\|
=Tk+1(uk,x^k+1)Tk+1(uk,h(uk,wk+1))\displaystyle=\|T_{k+1}(u_{k},\hat{x}_{k+1})-T_{k+1}(u_{k},h(u_{k},w_{k+1}))
+Tk+1(uk,h(uk,wk+1))uk+1\displaystyle~{}~{}+T_{k+1}(u_{k},h(u_{k},w_{k+1}))-u_{k+1}^{*}\|
Tk+1(uk,x^k+1)Tk+1(uk,h(uk,wk+1))\displaystyle\leq\|T_{k+1}(u_{k},\hat{x}_{k+1})-T_{k+1}(u_{k},h(u_{k},w_{k+1}))\|
+Tk+1(uk,h(uk,wk+1))Tk+1(uk+1,h(uk+1,wk+1))\displaystyle~{}~{}+\|T_{k+1}(u_{k},h(u_{k},w_{k+1}))-T_{k+1}(u_{k+1}^{*},h(u_{k+1}^{*},w_{k+1}))\|
ηxhud1(cwW(xk,uk,wk)+τσw(w¯))\displaystyle\leq\frac{\eta\ell_{x}\ell_{h_{u}}}{\sqrt{d_{1}}}\left(c_{w}W(x_{k},u_{k},w_{k})+\sqrt{\tau}\sigma_{w}^{\prime}(\bar{w})\right)
+ηxhuex,k+1+cP(uk+1uk+ukuk).\displaystyle~{}~{}~{}+\eta\ell_{x}\ell_{h_{u}}\|e_{x,k+1}\|+c_{P}\left(\|u_{k+1}^{*}-u_{k}^{*}\|+\|u_{k}^{*}-u_{k}\|\right).

(Part 1.d: Overall bounds) Putting together the bounds above, we have:

Wk+1cw(1+cwηxhu2)Wk+τ(1+cwηxhu2)σw(w¯)+cwhud1(cP+1)ukuk+cwηd1xhuex,k+1+cwhud1(cP+1)uk+1uk.\displaystyle\begin{split}W_{k+1}&\leq c_{w}\left(1+c_{w}\eta\ell_{x}\ell_{h_{u}}^{2}\right)W_{k}+\sqrt{\tau}(1+c_{w}\eta\ell_{x}\ell_{h_{u}}^{2})\sigma_{w}^{\prime}(\bar{w})\\ &~{}~{}+c_{w}\ell_{h_{u}}\sqrt{d_{1}}(c_{P}+1)\|u_{k}-u_{k}^{*}\|\\ &~{}~{}+c_{w}\eta\sqrt{d_{1}}\ell_{x}\ell_{h_{u}}\|e_{x,k+1}\|\\ &~{}~{}+c_{w}\ell_{h_{u}}\sqrt{d_{1}}(c_{P}+1)\|u_{k+1}^{*}-u_{k}^{*}\|.\end{split} (18a)
uk+1uk+1ηxhud1(cwW(xk,uk,wk)+τσw(w¯))+ηxhuex,k+1+cP(uk+1uk+ukuk).\displaystyle\begin{split}\|u_{k+1}&-u_{k+1}^{*}\|\leq\frac{\eta\ell_{x}\ell_{h_{u}}}{\sqrt{d_{1}}}\left(c_{w}W(x_{k},u_{k},w_{k})+\sqrt{\tau}\sigma_{w}^{\prime}(\bar{w})\right)\\ &~{}~{}~{}+\eta\ell_{x}\ell_{h_{u}}\|e_{x,k+1}\|\\ &~{}~{}~{}+c_{P}(\|u_{k+1}^{*}-u_{k}^{*}\|+\|u_{k}^{*}-u_{k}\|).\end{split} (18b)

Based on these intermediate results, conditions for ISS are derived next.

(Part 2: Sufficient conditions for stability) Define:

ωk:=[ukukWk],νk:=[uk+1ukex,k+1],σk:=[0σw(w¯)].\displaystyle\omega_{k}:=\begin{bmatrix}\|u_{k}-u_{k}^{*}\|\\ W_{k}\end{bmatrix},\nu_{k}:=\begin{bmatrix}\|u_{k+1}^{*}-u_{k}^{*}\|\\ \|e_{x,k+1}\|\end{bmatrix},\sigma_{k}:=\begin{bmatrix}0\\ \sigma_{w}^{\prime}(\bar{w})\end{bmatrix}.

Then, rewrite (18) as,

ωk+1M1ωk+M2νk+M3σk.\displaystyle\omega_{k+1}\leq M_{1}\omega_{k}+M_{2}\nu_{k}+M_{3}\sigma_{k}. (19)

By applying (19) to itself k+1k+1 times, we obtain:

ωk+1\displaystyle\omega_{k+1} (M1)k+1ω0\displaystyle\leq(M_{1})^{k+1}\omega_{0} (20)
+s=0k+1(M1)k+1sM2νs+s=0k+1(M1)k+1sM3σs.\displaystyle~{}~{}~{}+\sum_{s=0}^{k+1}(M_{1})^{k+1-s}M_{2}\nu_{s}+\sum_{s=0}^{k+1}(M_{1})^{k+1-s}M_{3}\sigma_{s}. (21)

Next, recall from (Jiang and Wang, 2001, Ex. 3.4) that for a Schur matrix M1M_{1}, there exists constants rM1>0r_{M_{1}}>0 and cM1[0,1)c_{M_{1}}\in[0,1) such that (M1)krM1cM1k\|(M_{1})^{k}\|\leq r_{M_{1}}c_{M_{1}}^{k}. By direct computation of the eigenvalues of M1M_{1}, we may enforce M1M_{1} to be Schur when τ>0\tau>0 satisfies:

τ>1d3log(d2d1).\displaystyle\tau>\frac{1}{d_{3}}\log\left(\frac{d_{2}}{d_{1}}\right). (22)

This condition critically relies on the the fact that η>0\eta>0 is chosen so that cP(0,1)c_{P}\in(0,1). Then, taking the norm on both sides (since the quantities are non-negative), and using the triangle inequality, one gets:

ωk+1rM1(cM1)k+1ω0+rM1s=0k+1(cM)k+1sM2νs+rM1s=0k+1(cM)k+1sM3σs.\displaystyle\begin{split}\|\omega_{k+1}\|&\leq r_{M_{1}}(c_{M_{1}})^{k+1}\|\omega_{0}\|\\ &+r_{M_{1}}\sum_{s=0}^{k+1}(c_{M})^{k+1-s}\|M_{2}\|\|\nu_{s}\|\\ &+r_{M_{1}}\sum_{s=0}^{k+1}(c_{M})^{k+1-s}\|M_{3}\|\|\sigma_{s}\|.\end{split} (23)

Define the following:

ν¯\displaystyle\bar{\nu} :=sup0skνs,M¯2:=M2,\displaystyle:=\sup_{0\leq s\leq k}\|\nu_{s}\|,\,\,\bar{M}_{2}:=\|M_{2}\|, (24a)
σ¯\displaystyle\bar{\sigma} :=sup0skσs,M¯3:=M3.\displaystyle:=\sup_{0\leq s\leq k}\|\sigma_{s}\|,\,\,\bar{M}_{3}:=\|M_{3}\|. (24b)

Then, (23) becomes,

ωk+1rM1(cM1)k+1ω0+rM1(s=1k+1(cM1)s)M¯2ν¯+rM1(s=1k+1(cM1)s)M¯3σ¯.\displaystyle\begin{split}\|\omega_{k+1}\|&\leq r_{M_{1}}(c_{M_{1}})^{k+1}\|\omega_{0}\|+r_{M_{1}}\left(\sum_{s=1}^{k+1}(c_{M_{1}})^{s}\right)\bar{M}_{2}\|\bar{\nu}\|\\ &+r_{M_{1}}\left(\sum_{s=1}^{k+1}(c_{M_{1}})^{s}\right)\bar{M}_{3}\|\bar{\sigma}\|.\end{split} (25)

Equivalently,

ωk+1rM1(cM1)k+1ω0+rM1cM1(s=1k(cM1)s)M¯2ν¯+rM1cM1(s=1k(cM1)s)M¯3σ¯.\displaystyle\begin{split}\|\omega_{k+1}\|&\leq r_{M_{1}}(c_{M_{1}})^{k+1}\|\omega_{0}\|\\ &+r_{M_{1}}c_{M_{1}}\left(\sum_{s=1}^{k}(c_{M_{1}})^{s}\right)\bar{M}_{2}\|\bar{\nu}\|\\ &+r_{M_{1}}c_{M_{1}}\left(\sum_{s=1}^{k}(c_{M_{1}})^{s}\right)\bar{M}_{3}\|\bar{\sigma}\|.\end{split} (26)

Then, we may apply the geometric series on the second and third terms to obtain:

ωk+1rM1(cM1)k+1ω0+rM1cM11+cM1(M¯2ν¯+M¯3σ¯).\displaystyle\begin{split}\|\omega_{k+1}\|&\leq r_{M_{1}}(c_{M_{1}})^{k+1}\|\omega_{0}\|\\ &+r_{M_{1}}\frac{c_{M_{1}}}{1+c_{M_{1}}}\left(\bar{M}_{2}\|\bar{\nu}\|+\bar{M}_{3}\|\bar{\sigma}\|\right).\end{split} (27)

Then, finally apply the Lyaunov quadratic bound to obtain the final form. By Assumption 3, we may write:

m1||[xkxkukuk]||ωkm2||[xkxkukuk]||,\displaystyle m_{1}\left\lvert\left\lvert\begin{bmatrix}x_{k}-x_{k}^{*}\\ u_{k}-u_{k}^{*}\end{bmatrix}\right\lvert\right\lvert\leq\|\omega_{k}\|\leq m_{2}\left\lvert\left\lvert\begin{bmatrix}x_{k}-x_{k}^{*}\\ u_{k}-u_{k}^{*}\end{bmatrix}\right\lvert\right\lvert, (28)

where

m1:=min{1,d1},m2:=max{1,d2}.\displaystyle m_{1}:=\min\left\{1,\sqrt{d_{1}}\right\},\,\,m_{2}:=\max\left\{1,\sqrt{d_{2}}\right\}. (29)

Substituting (28) into (27), we obtain:

||[xkxkukuk]||rM1m2m1(cM1)k+1||[x0x0u0u0]||+rM1cM1M¯2m1(1+cM1)||[sup0skusus1sup0skex,s]||+rM1cM1M¯3m1(1+cM1)||[0σw(w¯)]||.\displaystyle\begin{split}&\left\lvert\left\lvert\begin{bmatrix}x_{k}-x_{k}^{*}\\ u_{k}-u_{k}^{*}\end{bmatrix}\right\lvert\right\lvert\leq\frac{r_{M_{1}}m_{2}}{m_{1}}(c_{M_{1}})^{k+1}\left\lvert\left\lvert\begin{bmatrix}x_{0}-x_{0}^{*}\\ u_{0}-u_{0}^{*}\end{bmatrix}\right\lvert\right\lvert\\ &~{}~{}+\frac{r_{M_{1}}c_{M_{1}}\bar{M}_{2}}{m_{1}(1+c_{M_{1}})}\left\lvert\left\lvert\begin{bmatrix}\sup_{0\leq s\leq k}\|u_{s}^{*}-u_{s-1}^{*}\|\\ \sup_{0\leq s\leq k}\|e_{x,s}\|\end{bmatrix}\right\lvert\right\lvert\\ &~{}~{}+\frac{r_{M_{1}}c_{M_{1}}\bar{M}_{3}}{m_{1}(1+c_{M_{1}})}\left\lvert\left\lvert\begin{bmatrix}0\\ \sigma_{w}(\bar{w})\end{bmatrix}\right\lvert\right\lvert.\end{split} (30)

The bound then follows by noticing that:

ex,k\displaystyle\|e_{x,k}\| =x^kxk\displaystyle=\|\hat{x}_{k}-x_{k}\|
=p^(ζk)xk\displaystyle=\|\hat{p}(\zeta_{k})-x_{k}\|
=p^(ζk)p(ζk)+p(q(xk))xk\displaystyle=\|\hat{p}(\zeta_{k})-p(\zeta_{k})\|+\|p(q(x_{k}))-x_{k}\|

by using the fact that p(q(xk))xkε\|p(q(x_{k}))-x_{k}\|\leq\varepsilon, and by bounding the first term with supζ𝒬p^(ζ)p(ζ)\sup_{\zeta\in\mathcal{Q}}\|\hat{p}(\zeta)-p(\zeta)\|.