Perception-Based Sampled-Data Optimization of Dynamical Systems

Liliaokeawawa Cothren Gianluca Bianchin Sarah Dean Emiliano Dall’Anese Department of ECEE, University of Colorado Boulder ICTEAM Institute, Université Catholique de Louvain Department of Computer Science, Cornell University

Abstract

Motivated by perception-based and sensing-based control problems in autonomous systems, this paper addresses the problem of developing feedback controllers to regulate the inputs and the states of a dynamical system to optimal solutions of an optimization problem when one has no access to exact measurements of the system states. In particular, we consider the case where the states need to be estimated from high-dimensional sensory data received only at some time instants. We develop a sampled-data feedback controller that is based on adaptations of a projected gradient descent method and includes neural networks as integral components to estimate the state of the system from perceptual information. We derive sufficient conditions to guarantee (local) input-to-state stability of the control loop. Moreover, we show that the interconnected system tracks the solution trajectory of the underlying optimization problem up to an error that depends on the approximation errors of the neural network and on the time-variability of the optimization problem; the latter originates from time-varying safety and performance objectives, input constraints, and unknown disturbances.

keywords:

Learning for control, data-driven control, feedback optimization, output regulation.

^†^†thanks: This work was supported in part by the National Science Foundation through the award CMMI 2044946. Corresponding author: Liliaokeawawa Cothren: Liliaokeawawa.Cothren@colorado.edu

1 Introduction

A major challenge in controlling complex autonomous systems consists of incorporating rich data from perceptual and sensing sensor data. The performance of feedback control systems critically relies on the information extracted from perceptual sensing, which may require the processing of high-dimensional data only available at given spatio-temporal granularities; see, e.g. Dean and Recht (2021); Al Makdah et al. (2020); Xu et al. (2021); Dawson et al. (2022). This paper investigates how to integrate perceptual information into controllers inspired by optimization algorithms where the goal is to steer a dynamic system toward solutions of an optimization problem with costs associated with the system’s inputs and states. For example, in autonomous driving, the optimization problem may formalize way-point tracking and obstacle avoidance, whose information is provided by images from a dashboard camera. Other examples include robotics and power systems (the latter leveraging pseudo-measurements).

The line of research on feedback optimization goes back to earlier concepts of KKT-type controllers in Jokic et al. (2009); Brunner et al. (2012); Hirata et al. (2014), and it was recently expanded to include new classes of controllers inspired by first-order optimization methods in Colombino et al. (2020); Zheng et al. (2020); Hauswirth et al. (2021); Lawrence et al. (2020); Bianchin et al. (2021, 2022); Belgioioso et al. (2021); Agarwal et al. (2022); Simpson-Porco (2021); also see references therein. In this paper, we provide contributions relative to existing works by considering a setup where the cost of the optimization problem evolves over time (for instance, for way-point tracking and to avoid moving obstacles) and the state of the system cannot be directly measured. The latter is a distinctive feature of this work: we address the case in which optimization-based controllers must leverage perceptual information available at given temporal granularities and learning mechanisms to estimate the system state.

We develop a sampled-data feedback controller that is based on an adaptation of a projected gradient descent method. Based on the specified time-varying costs, the gradient-based controller generates inputs for the system which are then passed through a zero-order hold. Importantly, the controller leverages a trained neural network that maps perceptual information into estimates of the state of the system. We derive sufficient conditions to guarantee (local) input-to-state stability (ISS) of the control loop. In particular, we show that the interconnected system tracks the optimal solution trajectory of the optimization problem up to an error that depends on the approximation errors of the neural network and the time-variability of the cost and unknown disturbance. The ISS bounds are derived by leveraging the fundamental results of Jiang and Wang (2001) and Nešić et al. (1999).

We note that a similar perception-based regulation problem was considered in our prior work in Cothren et al. (2022a); however, in Cothren et al. (2022a) controllers operate at continuous time and thus do not account for the sample-data nature of the feedback information. A sampled-data controller was developed in Belgioioso et al. (2021); with respect to Belgioioso et al. (2021), we consider cases where the optimization problem is time-varying and the state is estimated via perception maps.

We test our controller on an autonomous driving application where vehicles are modeled via unicycle dynamics; the controller acquires the position of the vehicle from a neural network that estimates positions from images.

2 Problem Formulation

In the following, we formalize our research problem and discuss the necessary assumptions.¹¹1Notation. We denote by $\mathbb{N},\mathbb{Z}_{+},\mathbb{R},\mathbb{R}_{>0},\text{ and }\mathbb{R}_{\geq 0}$ the set of natural numbers, positive integers, real numbers, positive real numbers, and non-negative real numbers. For vectors $x\in\mathbb{R}^{n}$ and $u\in\mathbb{R}^{m}$ , $\|x\|$ is the Euclidean norm of $x$ , $\|x\|_{\infty}$ is the supremum norm, and $(x,u)\in\mathbb{R}^{n+m}$ is their vector concatenation. $x^{\top}$ denotes transposition, and $x_{i}$ denotes the $i$ -th element of $x$ . For a matrix $A\in\mathbb{R}^{n\times m}$ , $\|A\|$ is the induced $2$ -norm and $\|A\|_{\infty}$ is the supremum norm. The set $\mathcal{B}_{n}(r):=\{z\in\mathbb{R}^{n}:\|z\|<r\}$ is the open ball in $\mathbb{R}^{n}$ with radius $r>0$ ; $\mathcal{B}_{n}[r]:=\{z\in\mathbb{R}^{n}:\|z\|\leq r\}$ is the closed ball. Given two sets $\mathcal{X}\subset\mathbb{R}^{n}$ and $\mathcal{Y}\subset\mathbb{R}^{m}$ , $\mathcal{X}\times\mathcal{Y}$ is their Cartesian product; $\mathcal{X}+\mathcal{B}_{n}(r)$ is the open set defined as $\mathcal{X}+\mathcal{B}_{n}(r)=\{x+y:x\in\mathcal{X},y\in\mathcal{B}_{n}(r)\}$ . $\Pi_{\mathcal{U}}$ is the Euclidean projection of $z\in\mathbb{R}^{n}$ onto $\mathcal{U}\subset\mathbb{R}^{n}$ ; or, $\Pi_{\mathcal{U}}(z):=\operatorname{arg}\min_{u\in\mathcal{U}}\|u-z\|^{2}$ . A function $\gamma:\mathbb{R}_{\geq 0}\rightarrow\mathbb{R}_{\geq 0}$ is of class $\mathcal{K}$ if it is continuous, $\gamma(0)=0$ , and strictly increasing; it is of class $\mathcal{K}_{\infty}$ if it is additionally unbounded. A function $\beta:\mathbb{R}_{\geq 0}\times\mathbb{R}_{\geq 0}\to\mathbb{R}_{\geq 0}$ is of class $\mathcal{KL}$ if for each fixed $t$ the function $\beta(r,t)$ is of class $\mathcal{K}$ , and if for each fixed $r$ the function $\beta(r,t)$ is decreasing with respect to $t$ and is s.t. $\beta(r,t)\rightarrow 0$ for $t\rightarrow\infty$ . We consider systems that can be modeled using dynamics of the form:

\displaystyle\dot{x}=f(x,u,w),\,\,x(0)=x_{0},

(1)

where $x:\mathbb{R}_{\geq 0}\to\mathcal{X}\subseteq\mathbb{R}^{n_{x}}$ is the state, $u:\mathbb{R}_{\geq 0}\to\mathcal{U}\subseteq\mathbb{R}^{n_{u}}$ is the control input, $w:\mathbb{R}_{\geq 0}\to\mathcal{W}\subseteq\mathbb{R}^{n_{w}}$ is a time-varying unknown exogenous disturbance, and where the vector field $f:\mathcal{X}\times\mathcal{U}\times\mathcal{W}\to\mathbb{R}^{n_{x}}$ is continuously differentiable on the open and connected domain $\mathcal{X}\times\mathcal{U}\times\mathcal{W}$ and is Lipschitz-continuous in its arguments with constants $L_{x},L_{u}$ , and $L_{w}$ respectively. In this paper, motivated by practical hardware and operational requirements, we restrict our attention to cases where $u\in{\mathcal{U}}_{\textup{c}}$ at all times, where ${\mathcal{U}}_{\textup{c}}\subset\mathcal{U}$ is compact. We impose the following additional assumptions on the system (1).

Assumption 1

There exists a unique continuously differentiable map $h:\mathcal{U}\times\mathcal{W}\to\mathcal{X}$ such that, for any (constant) $\bar{u}\in\mathcal{U}$ and $\bar{w}\in\mathcal{W}$ , $f\left(h(\bar{u},\bar{w}),\bar{u},\bar{w}\right)=0.$ Moreover, $h(u,w)$ admits the decomposition $h(u,w)=h_{u}(u)+h_{w}(w),$ where $h_{u}$ and $h_{w}$ are Lipschitz continuous with constants $\ell_{h_{u}}$ and $\ell_{h_{w}}$ , respectively. $\square$

Assumption 2

For all $t\in\mathbb{R}_{\geq 0}$ , $w(t)\in{\mathcal{W}}_{\textup{c}}$ , where ${\mathcal{W}}_{\textup{c}}\subset\mathcal{W}$ is compact. Moreover, $t\mapsto w(t)$ is continuous. $\square$

Assumption 1 guarantees that, for constant inputs $\bar{u}$ and $\bar{w}$ , system (1) admits the unique equilibrium point $\bar{x}:=h(\bar{u},\bar{w})$ . Note that when $\nabla_{x}f(x,\bar{u},\bar{w})$ is invertible for any $\bar{u}$ and $\bar{w}$ , then the existence of $h(\bar{u},\bar{w})$ is always guaranteed. Furthermore, by the implicit function theorem, $h(\bar{u},\bar{w})$ is differentiable since $f(x,\bar{u},\bar{w})$ is differentiable. We also note that the equilibrium set $\mathcal{X}_{eq}:=\{h(\bar{u},\bar{w}):\bar{u}\in\mathcal{U}_{c},\,\,\bar{w}\in\mathcal{W}_{c}\}$ is compact; this is due to $\mathcal{U}_{c}\times\mathcal{W}_{c}$ being compact, $h(u,w)$ being continuously differentiable, and the result (Rudin, 1976, Thm. 4.14). For any $u\in\mathcal{U}_{c}$ , we have that $\|\nabla_{u}h(u,\bar{w})\|\leq\ell_{h_{u}}$ since $\mathcal{U}_{c}$ is compact Rudin (1976).

Next, as customary in the context of feedback optimization (see, e.g., Colombino et al. (2020); Hauswirth et al. (2021), we require a stability condition on the system to control. To this end, let $\mathcal{X}_{r}=\mathcal{X}_{eq}+\mathcal{B}_{r}(r)\subseteq\mathcal{X}$ , $r>0$ , be a set for which the following assumption holds.

Assumption 3

There exists a continuously differentiable function $V:\mathcal{X}_{r}\times\mathcal{U}\times\mathcal{W}\to\mathbb{R}$ with constants $d_{1},d_{2},d_{3}>0$ and a $\mathcal{K}$ -function $\sigma_{w}$ such that:

1.

For all $x\in\mathcal{X}$ , $u\in\mathcal{U}$ , and $w\in\mathcal{W}$ ,

$\quad\quad\quad d_{1}\|\tilde{x}\|^{2}\leq V(\tilde{x},u,w)\leq d_{2}\|\tilde{x}\|^{2},$

where $\tilde{x}:=x-h(u,w)$ ;
2.

For any constant $u\in\mathcal{U}$ ,

$\dot{V}(x(t),u,w(t))\leq-d_{3}V(x(t),u,w(t))+\sigma_{w}(\|\dot{w}(t)\|).\hfill\square$

From Khalil (2002), Assumption 3 implies that there exists constants $k,a,\gamma>0$ such that the following holds:

\displaystyle\|\tilde{x}(t)\|\leq k\|\tilde{x}(0)\|e^{-at}+\gamma\sigma_{w}\left(\sup_{0\leq\tau\leq t}(\|\dot{w}(\tau)\|)\right),

for some constant $\gamma>0$ , and for $x(0)\in\mathcal{X}_{0}:=\mathcal{X}_{eq}+\mathbb{B}_{n}(r_{0}),\,\,r_{0}<(r-\text{diam}({\mathcal{X}}_{\textup{eq}}))/k-\bar{\gamma}$ , $\bar{\gamma}:=\gamma\sigma_{w}\left(\sup_{t\geq 0}\|\dot{w}(t)\|)\right)$ .

We point out that when the (physical) system does not satisfy Assumption 3, then (1) models the pre-stabilized physical system.

2.1 Generative and Perception Maps

In this paper, we assume that the state $x$ of (1) is not directly measurable. Instead, one has access to nonlinear and possibly high-dimensional observations of the state $\zeta=q(x)$ , where $q:\mathcal{X}\rightarrow\mathbb{R}^{n_{\zeta}}$ is an unknown generative map. This setup emerges when information about the state is acquired through perceptual information from sensing and estimation mechanisms. For example, in applications in autonomous driving, vehicle states are often reconstructed from images generated by cameras. See, for example, the models in Dean and Recht (2021); Al Makdah et al. (2020); Murillo-González and Poveda (2022); also see the closely related observer design problems in Marchi et al. (2022); Chou et al. (2022).

Regarding the unknown map $x\mapsto q(x)$ , we make the following assumption (see also Dean and Recht (2021)).

Assumption 4

The map $q:\mathcal{X}\to\mathbb{R}^{n_{\zeta}}$ is such that the image of $q(\mathcal{X}^{\prime})$ is compact for any compact set $\mathcal{X}^{\prime}\subset\mathcal{X}$ . Further, there exists a map $p:\mathbb{R}^{n_{\zeta}}\to\mathbb{R}^{n_{x}}$ such that $p(\zeta)=p(q(x))=x+\varepsilon(x)$ , where where $\varepsilon(x)$ is bounded as $\|\varepsilon(x)\|\leq\bar{\varepsilon}$ for any $x\in\mathcal{X}$ , for a given finite $\bar{\varepsilon}\geq 0$ . $\square$

The function $p(\zeta)$ in Assumption 4 is referred to as the perception map; for a given observation $\zeta$ , it yields a possibly noisy estimate of the state, up to a bounded error $\varepsilon(x)$ . In this paper, we will leverage supervised learning methods to estimate the perception map $p(\zeta)$ from data.

2.2 Regulation to Solutions of an Optimization Problem

We focus on regulating the system (1) to the solution of the following time-varying optimization problem:


$\displaystyle(u^{}(t),x^{}(t))\in\arg\underset{\bar{u}\in\mathcal{U}_{c}}{\min}~{}~{}$	$\displaystyle\phi(\bar{u},t)+\psi(\bar{x},t)$	(2a)
s.t.	$\displaystyle\bar{x}=h(\bar{u},w(t)),$	(2b)

where $u\to\phi(u,t)$ and $x\to\psi(x,t)$ are functions that describe costs associated with the system’s inputs and states, respectively. We remark that (2) is a time-varying optimization problem for two reasons: (i) the cost functions are time-varying, which allow us to account for performance and safety objectives that evolve over time, and (ii) the constraint is time-varying since the system’s equilibrium point is parametrized by the time-varying signal $w(t)$ . Accordingly, (2) defines optimal trajectories $t\mapsto(u^{*}(t),x^{*}(t))$ for the system (1). Note that, since $h(u,w)$ is unique for any fixed $u$ and $w$ (see Assumption 1), the optimization problem (2) can be rewritten as:

\displaystyle u^{*}(t)\in\arg\underset{\bar{u}\in\mathcal{U}_{c}}{\min}\,\phi(\bar{u},t)+\psi\left(h(\bar{u},w(t)),t\right).

(3)

Given the problem (3) we formalize our control problem.

Problem 1 (Online optimization with state perception)

Design an output feedback controller so that the inputs and states of (1) track the time-varying solution $(u^{*}(t),x^{*}(t))$ of (3) when $w(t)$ in unknown and $x$ is not measurable; instead, we have access only to state estimates $\hat{x}=\hat{p}(\zeta)$ at certain instants $\mathbb{S}=\{k\tau:k\in\mathbb{Z}_{+}\}$ , $\tau>0,$ where $\zeta=q(x)$ and $\hat{p}(\cdot)$ is an estimate of the map $p(\cdot)$ . $\square$

Remark 1

(Implicit solution) Since the problem (3) is parametrized by the unknown exogenous inputs $w(t)$ , the solutions of (3) cannot be computed explicitly via standard numerical optimization methods. We seek feedback controllers that drive inputs and states of (1) to solutions of (3) by relying only on estimates of the state $\hat{x}=\hat{p}(\zeta)$ , and without requiring sensing of the disturbance $w(t)$ . $\square$

Remark 2

(Interpretation of the control problem) Recall that (3) is parametrized by a time-varying disturbance $w_{t}$ and has time-varying costs. Thus, (3) formalizes an equilibrium seeking problem, where the objective is to select optimal input-state pairs $(u^{*}(t),x^{*}(t))$ that minimizes the specified time-varying cost at each time $t$ (see, e.g., Colombino et al. (2020); Bianchin et al. (2021); Belgioioso et al. (2021)). This is a high-level regulation problem that can be nested with a stabilizing controller. $\square$

We conclude this section with some relevant assumptions.

Assumption 5

The following hold:

5(i) The function $u\mapsto\nabla\phi(u,t)$ is $\ell_{u}$ -Lipschitz continuous for all $u\in\mathcal{U}$ , $\ell_{u}\geq 0$ , for all $t$ .

5(ii) The function $x\mapsto\nabla\psi(x,t)$ is $\ell_{x}$ -Lipschitz continuous for all $x\in\mathcal{X}$ , $\ell_{x}\geq 0$ , for all $t$ .

5(iii) The function $u\mapsto\nabla\phi(u,t)+\partial_{u}h(u,w)^{\top}\nabla\psi(h(u,w),t)$ is $\mu$ -strongly convex with $\mu>0$ , for all $u\in\mathcal{U}$ and for all $t$ , where $\partial_{u}h(u,w)$ is the Jacobian of $h(u,w)$ w.r.t. $u$ .

5(iv) The set $\mathcal{U}_{c}\subset\mathbb{R}^{n_{u}}$ is convex. $\square$

Note that, from Assumption 5, it follows that the composite cost $u\mapsto\nabla\psi(u)+\partial_{u}h(u,w)^{\top}\nabla\psi(h(u,w))$ is $\ell$ -Lipschitz continuous with constant $\ell:=\ell_{u}+\ell_{h_{u}}^{2}\ell_{x}$ .

3 Perception-Based System Regulation

To address Problem 1, we consider the design of feedback controllers that are inspired by projected-gradient-type methods as in Bianchin et al. (2021); Cothren et al. (2022a). However, to acknowledge the fact that the state estimates are available only at given time intervals (for example, images from a camera are captured at a given frequency), our controller design is based on a sampled data mechanism. The controller is equipped with a supervised learning method to estimate the perception map.

Towards this, let $\tau>0$ represent the period between two consecutive arrivals of perceptual data (for example, images) and let $k\in\mathbb{Z}_{+}$ be the sampling index, so that $\mathbb{S}=\{k\tau:k\in\mathbb{Z}_{+}\}$ is the set of times where perceptual information arrives and control inputs are updated. Accordingly, denote as $x_{k}=x(k\tau)$ and $w_{k}=w(k\tau)$ the sampled states and disturbance at time $k\tau$ , and let $\phi_{k}(u):=\phi(u,k\tau)$ and $\psi_{k}(x):=\psi(x,k\tau)$ for notational brevity. We propose the following projected-gradient-type controller to generate inputs $\{u_{k}\}$ at each time $k\tau$ , $k\in\mathbb{Z}_{+}$ :

\displaystyle u_{k}=\Pi_{\mathcal{U}_{c}}\left\{u_{k-1}-\eta\Psi_{k}(u_{k-1},x_{k})\right\},

(4)

where $\eta>0$ is a tunable parameter (also known as step size in the gradient descent literature), and

\Psi_{k}(u,x):=\nabla\phi_{k}(u)+H(u)^{\top}\nabla\psi_{k}(x),

where $H(u)$ is the Jacobian of $h_{u}$ evaluated at $u$ .

The controller (4) is of the form of a projected gradient-type algorithm; here, we have modified it by including the gradient evaluated at the instantaneous system state $x_{k}$ to circumvent the need to measure the exogenous input $w_{k}$ . We also note that the map $\Psi_{k}$ is applied to the current state $x_{k}$ , and the previous control input $u_{k-1}$ , which is applied to the system over the interval $[(k-1)\tau,k\tau)$ . Critically, the controller relies on the knowledge of the system state $x(t)$ at time $\tau k$ , which cannot be observed directly. To address this, we consider training a neural network to obtain an estimate $\hat{p}$ of the perception map $p$ (as in Assumption 4) which gives estimates of state $x(t)$ from the perceptual data $\zeta(t)$ .

Towards this, we consider a set of training points $\{x^{(i)},\zeta^{(i)}=q(x^{(i)})\}_{i=1}^{N}$ to guarantee that the network training is well-posed as stated next.

Assumption 6

The training points $\{x^{(i)}\}_{i=1}^{N}$ are drawn from the compact set $\mathcal{X}_{\text{tr}}:=\mathcal{X}_{eq}+\mathcal{B}_{n}[r_{\text{tr}}]$ , $r_{0}\leq r_{\text{tr}}\leq r$ . $\square$

Hereafter, let $\mathcal{Q}:=q(\mathcal{X}_{\text{tr}})$ denote the perception set associated with the training set, which is a compact set. Assumption 6 allows us to leverage existing results on the bounds on the approximation error $\sup_{\zeta\in\mathcal{Q}}\|\hat{p}(\zeta)-p(\zeta)\|$ of feedforward neural networks and residual neural networks over the compact set $\mathcal{Q}$ ; see Hornik et al. (1989) and Marchi et al. (2022); Tabuada and Gharesifard (2020).

With an estimate $\hat{p}$ of the perception map $p$ obtained via a neural network, the proposed perception-based controller is shown in Figure 1 and is tabulated in Algorithm 1.

Algorithm 1 Regulation with NN State Perception

# Training

Given: training set $\{(x^{(i)},\zeta^{(i)})\}_{i=1}^{N}$

Obtain: $\hat{p}\leftarrow\operatorname{NN-learning}(\{(x^{(i)},\zeta^{(i))}\}_{i=1}^{N})$

# Gradient-based Sampled-Data Feedback Control

Given: set $\mathcal{U}_{c}$ , funct.s $\phi(\cdot,t),\psi(\cdot,t),H(u)$ , $\hat{p}$ , gain $\eta$

Initial conditions: $x(0)\in\mathcal{X}_{0}$ , $u(0)\in\mathcal{U}_{c}$

For $t\geq 0$ , $k\in\mathbb{Z}_{+}$ :


$\displaystyle\dot{x}(t)$	$\displaystyle=f(x(t),u(t),w(t))$	(5a)
$\displaystyle\zeta_{k}$	$\displaystyle=q(x(k\tau))$	(5b)
$\displaystyle u_{k}$	$\displaystyle=\Pi_{\mathcal{U}_{c}}\left\{u_{k-1}-\eta\Psi_{k}(u_{k-1},\hat{p}(\zeta_{k}))\right\}$	(5c)
$\displaystyle u(t)$	$\displaystyle=u_{k},\,\,\,\,t\in[k\tau,(k+1)\tau)$	(5d)

Refer to caption — Figure 1: Block diagram of the proposed perception-based feedback controller in closed loop with the plant.

In the training phase, the operation $\operatorname{NN-learning}(\cdot)$ refers to a generic training procedure for the neural network via empirical risk minimization, which results in the approximate map $\hat{p}(\cdot)$ . In the proposed controller, $\hat{p}(\cdot)$ is utilized to obtain estimates of the state of the dynamical system $\hat{x}_{k}=\hat{p}(\zeta_{k})$ , which is subsequently utilized to compute the gradient map $\Psi_{k}(u,\hat{p}(\zeta))$ . Note that, as in sampled-data systems, the input $u(t)$ is computed based on the control iterates $\{u_{k}\}$ as the piece-wise constant signal $u(t)=u_{k},t\in[k\tau,(k+1)\tau)$ , $k\in\mathbb{Z}_{+}$ .

4 Stability and Tracking Analysis

To analyze the performance of the closed-loop system (5), recall that $(u^{*}_{k},x^{*}_{k})$ is the sequence of optimizers of the time-varying problem (2) at the times in $\mathbb{S}$ . As proposed in Belgioioso et al. (2021), we consider a discrete-time counterpart of (5); sampling (5) at times in $\mathbb{S}$ yields:


$\displaystyle x_{k+1}$	$\displaystyle=F(x_{k},u_{k},w_{k}),$	(6a)
$\displaystyle\zeta_{k}$	$\displaystyle=q(x_{k}),$	(6b)
$\displaystyle u_{k}$	$\displaystyle=\Pi_{\mathcal{U}_{c}}\left\{u_{k-1}-\eta\Psi_{k}(u_{k-1},\hat{p}(\zeta_{k}))\right\},$	(6c)

where $F(X_{0},v,w)$ denotes the solution of the initial value problem $\dot{X}(t)=f(X(t),v,w(t))$ , with $X(t_{0})=X_{0}\in\mathcal{X}_{r}$ at time $t=t_{0}+\tau$ , where $\tau$ is the sampling period. The tracking results for (6) will then translate into transient bounds for the sampled-data system (5) by using (Nešić et al., 1999, Theorem 5).

Let $z_{k}:=(x_{k}-x^{*}_{k},u_{k}-u^{*}_{k})$ and define the matrix $M_{1}\in\mathbb{R}^{2\times 2}$ as,

\displaystyle M_{1}:=\begin{bmatrix}c_{P}&\eta\ell_{x}\ell_{h_{u}}c_{w}/\sqrt{d_{1}}\\ c_{w}\ell_{h_{x}}\sqrt{d_{1}}(1+c_{P})&c_{w}\left(1+c_{w}\eta\ell_{x}\ell_{h_{x}}^{2}\right)\end{bmatrix},

(7)

where $c_{w}:=e^{-d_{3}\tau/2}\sqrt{d_{2}/d_{1}}$ , $d_{1}>0$ is given in Assumption 3, and $c_{P}:=\sqrt{1-\eta(2\mu-\eta\ell^{2})}$ ; we further define:

	$\displaystyle M_{2}$	$\displaystyle:=\begin{bmatrix}1&\eta\ell_{x}\ell_{h_{u}}\\ c_{w}\ell_{h_{u}}\sqrt{d_{1}}(c_{P}+1)&c_{w}\sqrt{d_{1}}\ell_{h_{u}}\eta\ell_{x}\ell_{h_{u}}\end{bmatrix},$		(8)
	$\displaystyle M_{3}$	$\displaystyle:=\begin{bmatrix}c_{w}\eta\ell_{x}\ell_{h_{u}}^{2}(\sqrt{\tau}+1)\\ \frac{\eta\ell_{x}\ell_{h_{u}}}{\sqrt{d_{1}}}\sqrt{\tau},\end{bmatrix}\,.$		(9)

With these definitions, our main result provides transient bounds for the system (6).

Theorem 1 (Transient bound for (6))

Consider the closed-loop system (6). Let Assumptions 1-5 hold, and assume that

\displaystyle\tau>\frac{1}{d_{3}}\log\left(\frac{d_{2}}{d_{1}}\right),\quad\eta\in\left(0,\frac{2\mu}{\ell^{2}}\right).

(10)

Then, the tracking error satisfies

	$\displaystyle\\|z_{k}\\|$	$\displaystyle\leq\frac{r_{M_{1}}m_{2}}{m_{1}}c_{M_{1}}^{k+1}\\|z_{0}\\|+b\\|M_{3}\\|\sigma_{w}\left(\sup_{0\leq s\leq\tau k}\\|\dot{w}(s)\\|\right)$
		$\displaystyle~{}~{}~{}~{}~{}~{}+b\\|M_{2}\\|\left\lvert\left\lvert\begin{bmatrix}\sup_{1\leq s\leq k}\\|u_{s}^{}-u_{s-1}^{}\\|\\ \sup_{\zeta\in\mathcal{Q}}\\|\hat{p}(\zeta)-p(\zeta)\\|+\bar{\varepsilon}\end{bmatrix}\right\lvert\right\lvert,$		(11)

for any $u(0)\in\mathcal{U}_{c},x(0)\in\mathcal{X}_{eq}+\mathbb{B}_{n}(r_{I})$ for a sufficiently small $0<r_{I}<r_{0}$ , where $m_{1}:=\min\left\{1,\sqrt{d_{1}}\right\},m_{2}:=\max\left\{1,\sqrt{d_{2}}\right\}$ , $b:=\frac{r_{M_{1}}c_{M_{1}}}{m_{1}(1+c_{M_{1}})}$ , and the constants $r_{M_{1}}>0$ and $c_{M_{1}}\in[0,1)$ are s.t. $\|M_{1}^{k}\|\leq r_{M_{1}}c_{M_{1}}^{k},$ $\forall k\in\mathbb{Z}_{+}$ . $\square$

Theorem 1 guarantees exponential convergence of the sampled trajectory of the tracking error to a neighborhood of zero. The size of the neighborhood depends on: $\sup_{\zeta\in\mathcal{Q}}\|\hat{p}(\zeta)-p(\zeta)\|+\bar{\varepsilon}$ , which corresponds to the error associated with the state estimation, $\sup_{1\leq s\leq k}\|u_{s}^{*}-u_{s-1}^{*}\|$ , which captures the time-variability of the optimizer, and $\sigma_{w}(\sup_{0\leq s\leq\tau k}\|\dot{w}(s)\|)$ corresponds to the time-variability of the unknown disturbance. The proof is provided in the extended version of the paper Cothren et al. (2022b).

The transient bound (11) shows that the sampled system (6) is input-to-state stable (ISS), in the sense of Jiang and Wang (2001), with respect to $\|\dot{w}\|$ , the drift on the optimal solution $\|u_{k}^{*}-u_{k-1}^{*}\|$ , and the error introduced by the estimated perception map. To translate the results of Theorem 1 into transient bounds for the continuous system (5), we leverage Theorem 5 of Nešić et al. (1999). In particular, let $z(t):=(x(t)-x^{*}(t),u_{k}-u^{*}(t))$ , where $u^{*}(t)$ is piece-wise constant and such that $u^{*}(t)=u^{*}_{k}$ for $t\in[k\tau,(k+1)\tau)$ , and $x^{*}(t)$ is defined similarly. Then, we obtain the following.

Theorem 2 (Transient bound for (5))

Consider the closed-loop system (5). Let Assumptions 1-5 hold, and assume that $\tau,\eta$ satisfy (10). Then, there exist $\beta\in\mathcal{KL}$ and $\gamma_{w},\gamma_{u},\gamma_{p}\in\mathcal{K}_{\infty}$ such that

	$\displaystyle\\|z(t)\\|$	$\displaystyle\leq\beta(\\|z_{0}\\|,t)+\gamma_{w}\left(\sup_{0\leq s\leq t}\\|\dot{w}(s)\\|\right)+\gamma_{u}\left(\Delta_{u}^{*}\right)$
		$\displaystyle~{}~{}~{}~{}~{}~{}+\gamma_{p}\left(\sup_{\zeta\in\mathcal{Q}}\\|\hat{p}(\zeta)-p(\zeta)\\|+\bar{\varepsilon}\right)$		(12)

holds for any $u(0)\in\mathcal{U}_{c},x(0)\in\mathcal{X}_{eq}+\mathbb{B}_{n}(r_{I})$ for a sufficiently small $0<r_{I}<r_{0}$ , given that $\Delta_{u}^{*}:=\sup_{k\in\mathbb{Z}_{+}}\|u^{*}_{k}-u^{*}_{k-1}\|\leq r_{u}$ , $\sup_{0\leq s\leq t}\|\dot{w}(s)\|\leq r_{w}$ , and $\sup_{\zeta\in\mathcal{Q}}\|\hat{p}(\zeta)-p(\zeta)\|+\bar{\varepsilon}\leq r_{p}$ for some finite $r_{w},r_{u},r_{p}>0$ . $\square$

Mirroring (11), the bound (12) shows that the system (5) is ISS with respect to $\|\dot{w}\|$ , the drift on the optimal solution $\Delta_{u}^{*}$ , and the error introduced by the neural network.

To provide a connection with the existing literature, we point out that (12) generalizes the following sub-cases: (i) when the state $x(t)$ can be observed (without errors), then (12) reduces to a bound similar to Bianchin et al. (2021) (where, however, the controller is a continuous-time gradient flow); (ii) when the state $x(t)$ can be observed and the functions $\phi(u)$ and $\psi(x)$ are time invariant, then (12) boils down to the ISS result of Belgioioso et al. (2021) (for sampled-data controllers) and Colombino et al. (2020) (for continuous-time gradient flows) and Zheng et al. (2020).

Remark 3

When a residual network is utilized to estimate the map $p$ , the error $\sup_{\zeta\in\mathcal{Q}}\|\hat{p}(\zeta)-p(\zeta)\|$ on the compact set $\mathcal{Q}$ in (12) can be bounded as shown in Marchi et al. (2022); Tabuada and Gharesifard (2020), under given conditions on the selection of the training points. The bound in Marchi et al. (2022); Tabuada and Gharesifard (2020) is particularly interesting because it ties the approximation error with the training error and the geometry of the residual network. We do not include a customization of the bound (12) using the results of Tabuada and Gharesifard (2020) due to space limitations. $\square$

Theorem 2 follows from the results of Theorem 1 and Theorem 5 of Nešić et al. (1999) by noticing that (5) is uniformly bounded over an interval $\tau$ in the sense of (Nešić et al., 1999, Definition 2). This is because $u(t)$ is piece-wise constant and takes values from the compact set $\mathcal{U}_{c}$ , $\mathcal{W}_{c}$ is compact (and $\|\dot{w}\|$ is bounded), and the perception error is bounded; boundedness of $x(t)$ w.r.t. the sampled sequence $\{x_{k}\}$ follows from (Belgioioso et al., 2021, Lemma 1). The proof is omitted due to space limitations.

5 Application to Autonomous Driving

We utilize our controller in Algorithm 1 to control a vehicle to track a set of reference points while avoiding obstacles; the position of the vehicle is accessible only through camera images. The vehicle’s movement is modeled by the unicycle dynamics with state $x=(a,b,\theta)$ , where $r:=(a,b)\in\mathbb{R}^{2}$ is the position in the 2D plane and $\theta\in(-\pi,\pi]$ is the orientation with respect to the $a$ -axis: $\dot{a}=v\cos(\theta)$ , $\dot{b}=v\sin(\theta),\,\,\dot{\theta}=w$ , where $v,w\in\mathbb{R}$ are controllable inputs. Importantly, we assume that we do not have direct knowledge of the state $x=(a,b,\theta)$ , but instead, camera images, represented by the generative map $\zeta=q(x)$ , which return the position $r=(a,b)$ . We consider a lower-level stabilizer to stabilize the unicycle dynamics to satisfy Assumption 3. Following (Cothren et al., 2022a, Lemma 2), let $u=(u_{a},u_{b})$ denote the control inputs for each direction $a$ and $b$ , and consider the change of coordinates to the error variables, $\xi:=\|u-x\|$ , $\phi:=\operatorname{atan}\left(\frac{u_{b}-b}{u_{a}-a}\right)-\theta$ . The dynamics of these are $\dot{\xi}=-v\cos(\phi)$ and $\dot{\phi}=\frac{v}{\xi}\sin(\phi)-w$ . By setting

\displaystyle v=\kappa\xi\cos(\phi),\,\,w=\kappa(\cos(\phi)+1)\sin(\phi)+\kappa\phi,

(13)

for some $\kappa>0$ , the unicycle dynamics admit a globally exponentially stable equilibrium point $(\xi,\phi)=(0,0)$ . By setting the constants $v,w\in\mathbb{R}$ as in (13), the dynamic plant satisfies Assumption 3.

We consider a sequence of locations that the vehicle would like to follow, denoted as $\mathcal{T}_{d}:=\{r_{d,k}\in\mathcal{R},k\in\mathbb{Z}_{+}\}$ , where $\mathcal{R}:=\{r=(a,b):x=(a,b,\theta)\in\mathcal{X}\}$ . To avoid obstacles around the vehicle, we consider building at each time $\tau k$ , $k\in\mathbb{Z}_{+}$ , the free workspace of the vehicle defined as $\mathcal{F}_{k}(r_{k}):=\{r:a_{k}^{(i)}(r_{k})^{\top}r-b^{(i)}(r_{k})\geq 0,i=1,\dots,M_{k}\}$ , where $M_{k}$ is the number of obstacles at time $\tau k$ , $r^{(i)}$ is the center position of the $i$ th obstacle, $a_{k}^{(i)}(r_{k})=r^{(i)}-r_{k}$ , $b^{(i)}(r_{k})$ is a scalar computed depending on vehicle and obstacles positions, and $r_{k}\in\mathcal{R}$ is the position at time $\tau k$ . The free workspace $\mathcal{F}(r_{k})$ describes a local neighborhood of the vehicle that is guaranteed to be free of obstacles.

To track the desired trajectory $\mathcal{T}_{d}$ while avoiding obstacles, we utilize the following waypoint-tracking formulation with a barrier function

\displaystyle\psi_{k}(r):=\frac{1}{2}\|r-r_{d,k}\|^{2}-\lambda_{k}\sum_{i=1}^{M}\log(b^{(i)}(r_{k})-a^{(i)}(r_{k})^{\top}x),

where $\lambda_{k}>0$ is a tuning parameter; in particular, in the simulations we set $\lambda_{k}=1/\exp(0.1k)$ . We set $\phi_{k}(u)=0$ .

We used a residual neural network to estimate the position of the vehicle from aerial images. specifically, the network returns estimated coordinates in the $(a,b)$ . For training, we generate $94,500$ images of size $64\times 64$ pixels depicting a red bot in the roundabout setting in Figure 2. The images were built using the MATLAB Image Processing Toolbox and basic plotting functions therein by setting the background as an aerial view of a roundabout and plotting a red square for the vehicle. We used the resnet50 structure given in the MATLAB Deep Learning Toolbox and tailored the input ( $64\times 64\times 3$ sized RGB images) and output (total number of labels) sizes to our specific case. For labels, we selected the pixels that corresponded to parts of the image containing the road so that the network only trains on data corresponding to allowable surfaces for the robot, which totaled to $135$ unique labels. Finally, we select five checkpoints along the road for the robot to follow (denoted by the red stars in Figure 2) during the execution of the algorithm corresponding to $\{r_{d,k}\}$ .

Simulation results are given in Fig. 2 for the Algorithm 1. The dashed magenta line corresponds to the trajectory of the unicycle for the neural network-assisted controller, and the yellow line is for the controller with perfect state information. Fig. 3 shows the error $\|r(t)-r^{*}(t)\|$ between the optimal and actual position of the vehicle. Importantly, the trajectory tracks the time-varying reference points within an error and avoids the obstacles.

6 Conclusion

We proposed an algorithm to regulate dynamical systems towards the solution of a convex optimization problem when we do not have full knowledge of the system states. Specifically, we developed a sampled-data feedback controller that is augmented with a neural network to estimate the state of the system from high-dimensional sensory data. Our results guaranteed exponential convergence of the interconnected system up to an error term dependent on the temporal variability of the problem and the error due to estimating the state from the neural network.

References

Agarwal et al. (2022) Agarwal, A., Simpson-Porco, J.W., and Pavel, L. (2022). Game-theoretic feedback-based optimization. IFAC-PapersOnLine, 55(13), 174–179.
Al Makdah et al. (2020) Al Makdah, A.A., Katewa, V., and Pasqualetti, F. (2020). Accuracy prevents robustness in perception-based control. In American Control Conference, 3940–3946.
Belgioioso et al. (2021) Belgioioso, G., Liao-McPherson, D., de Badyn, M.H., Bolognani, S., Lygeros, J., and Dorfler, F. (2021). Sampled-data online feedback equilibrium seeking: Stability and tracking. In IEEE Conference on Decision and Control.
Bianchin et al. (2021) Bianchin, G., Cortés, J., Poveda, J.I., and Dall’Anese, E. (2021). Time-varying optimization of LTI systems via projected primal-dual gradient flows. IEEE Trans. on Control of Networked Systems.
Bianchin et al. (2022) Bianchin, G., Poveda, J.I., and Dall’Anese, E. (2022). Online optimization of switched lti systems using continuous-time and hybrid accelerated gradient flows. Automatica, 146, 110579.
Brunner et al. (2012) Brunner, F.D., Dürr, H.B., and Ebenbauer, C. (2012). Feedback design for multi-agent systems: A saddle point approach. In IEEE Conference on Decision and Control, 3783–3789.
Chou et al. (2022) Chou, G., Ozay, N., and Berenson, D. (2022). Safe output feedback motion planning from images via learned perception modules and contraction theory. arXiv preprint arXiv:2206.06553.
Colombino et al. (2020) Colombino, M., Dall’Anese, E., and Bernstein, A. (2020). Online optimization as a feedback controller: Stability and tracking. IEEE Trans. On Control of Networked Systems, 7(1), 422–432.
Cothren et al. (2022a) Cothren, L., Bianchin, G., and Dall’Anese, E. (2022a). Online optimization of dynamical systems with deep learning perception. IEEE Open Journal of Control Systems, 1, 306–321.
Cothren et al. (2022b) Cothren, L., Bianchin, G., Dean, S., and Dall’Anese, E. (2022b). Perception-based sampled-data optimization of dynamical systems (longer version). arXiv:2211.10020.
Dawson et al. (2022) Dawson, C., Lowenkamp, B., Goff, D., and Fan, C. (2022). Learning safe, generalizable perception-based hybrid control with certificates. IEEE Robotics and Automation Letters, 7(2), 1904–1911.
Dean and Recht (2021) Dean, S. and Recht, B. (2021). Certainty equivalent perception-based control. In Learning for Dynamics and Control, 399–411. PMLR.
Hauswirth et al. (2021) Hauswirth, A., Bolognani, S., Hug, G., and Dörfler, F. (2021). Timescale separation in autonomous optimization. IEEE Trans. on Automatic Control, 66(2), 611–624.
Hirata et al. (2014) Hirata, K., Hespanha, J.P., and Uchida, K. (2014). Real-time pricing leading to optimal operation under distributed decision makings. In American Control Conf.
Hornik et al. (1989) Hornik, K., Stinchcombe, M., and White, H. (1989). Multilayer feedforward networks are universal approximators. Neural networks, 2(5), 359–366.
Jiang and Wang (2001) Jiang, Z.P. and Wang, Y. (2001). Input-to-state stability for discrete-time nonlinear systems. Automatica, 37(6), 857–869.
Jokic et al. (2009) Jokic, A., Lazar, M., and Van Den Bosch, P.P.J. (2009). On constrained steady-state regulation: Dynamic KKT controllers. IEEE Trans. on Automatic control, 54(9), 2250–2254.
Khalil (2002) Khalil, H.K. (2002). Nonlinear Systems; 3rd ed. Prentice-Hall, Upper Saddle River, NJ.
Lawrence et al. (2020) Lawrence, L.S., Simpson-Porco, J.W., and Mallada, E. (2020). Linear-convex optimal steady-state control. IEEE Transactions on Automatic Control, 66(11), 5377–5384.
Marchi et al. (2022) Marchi, M., Bunton, J., Gharesifard, B., and Tabuada, P. (2022). Safety and stability guarantees for control loops with deep learning perception. IEEE Control Systems Letters, 6, 1286–1291.
Murillo-González and Poveda (2022) Murillo-González, A. and Poveda, J.I. (2022). Data-assisted vision-based hybrid control for robust stabilization with obstacle avoidance via learning of perception maps. In 2022 American Control Conference (ACC), 886–892. IEEE.
Nešić et al. (1999) Nešić, D., Teel, A.R., and Sontag, E.D. (1999). Formulas relating kl stability estimates of discrete-time and sampled-data nonlinear systems. Systems & Control Letters, 38(1), 49–60.
Rudin (1976) Rudin, W. (1976). Principles of mathematical analysis, volume 3. McGraw-hill New York.
Simpson-Porco (2021) Simpson-Porco, J.W. (2021). Low-gain stability of projected integral control for input-constrained discrete-time nonlinear systems. IEEE Control Systems Letters, 6, 788–793.
Tabuada and Gharesifard (2020) Tabuada, P. and Gharesifard, B. (2020). Universal approximation power of deep residual neural networks via nonlinear control theory. arXiv preprint arXiv:2007.06007.
Xu et al. (2021) Xu, J., Lee, B., Matni, N., and Jayaraman, D. (2021). How are learned perception-based controllers impacted by the limits of robust control? In Learning for Dynamics and Control, 954–966.
Zheng et al. (2020) Zheng, T., Simpson-Porco, J., and Mallada, E. (2020). Implicit trajectory planning for feedback linearizable systems: A time-varying optimization approach. In American Control Conference, 4677–4682.

Appendix A Proofs

Proof of Theorem 1.

The proof of the Theorem 1 is divided into two main parts. In the first part, we construct intermediate bounds to be used in the second part of the proof.

(Part 1.a: Lyapunov Function) The first is to leverage the results in Lemma 1 of Belgioioso et al. (2021). By Assumption 3, we have that for a fixed $u\in\mathcal{U}$ ,

\displaystyle\dot{V}(t)\leq-d_{3}V(t)+\sigma_{w}(\|\dot{w}\|).

(14)

Given the initial condition $V(t_{0})=V_{0}$ , (14) implies that:

	$\displaystyle\begin{split}V(x(t)&,u,w_{t})\leq V_{0}e^{-d_{3}(t-t_{0})}+\int_{t_{0}}^{t}\sigma_{w}(\\|\dot{w}(s)\\|)ds\end{split}$			(15)
		$\displaystyle\leq V_{0}e^{-d_{3}(t-t_{0})}+\sigma_{w}(\bar{w}),$		(16)

where $\bar{w}:=\sup_{s\in[t_{0},t]}\|\dot{w}(s)\|$ . Define $V_{k}:=V(x_{k},u_{k},w_{k})$ and $W_{k}:=\sqrt{V_{k}}$ . Then, from Lemma 1 of Belgioioso et al. (2021), it follows that

\displaystyle aW_{k+1}\leq c_{w}W_{k}+c_{w}\ell_{h_{u}}\sqrt{d_{1}}\|u_{k+1}-u_{k}\|+\sqrt{\tau}\sigma_{w}^{\prime}(\bar{w}),

where $c_{w}:=e^{-d_{3}\tau/2}\sqrt{d_{2}/d_{1}}$ , $\ell_{h_{u}}$ is the Lipschitz constant of the steady state map w.r.t. $u$ , and $\sigma_{w}^{\prime}:=\sqrt{\sigma_{w}}\in\mathcal{K}$ .

(Part 1.b: Bound for $\|u_{k+1}-u_{k}\|$ ) To simplify notation, rewrite the controller as $u_{k}:=T_{k}(u_{k-1},\hat{x}_{k})$ , where $T_{k}(u,x)=\Pi_{\mathcal{U}_{c}}\left\{u-\eta\Psi_{k}(u,x)\right\}$ . Moreover, let $e_{x,k}$ denote the error in the gradient introduced by the perception map. Using this notation, calculate:

	$\displaystyle\\|u_{k+1}-u_{k}\\|=\\|T_{k+1}(u_{k},\hat{x}_{k+1})-u_{k}\\|$
	$\displaystyle=\\|T_{k+1}(u_{k},\hat{x}_{k+1})-T_{k+1}(u_{k+1}^{},h(u_{k+1}^{},w_{k+1}))$
	$\displaystyle~{}~{}~{}+u_{k+1}^{}-u_{k}^{}+u_{k}^{*}-u_{k}\\|$
	$\displaystyle=\\|T_{k+1}(u_{k},\hat{x}_{k+1})-T_{k+1}(u_{k},h(u_{k},w_{k+1}))$
	$\displaystyle~{}~{}~{}+T_{k+1}(u_{k},h(u_{k},w_{k+1}))-T_{k+1}(u_{k+1}^{},h(u_{k+1}^{},w_{k+1}))$
	$\displaystyle~{}~{}~{}+u_{k+1}^{}-u_{k}^{}+u_{k}^{*}-u_{k}\\|$
	$\displaystyle\leq\\|T_{k+1}(u_{k},\hat{x}_{k+1})-T_{k+1}(u_{k},h(u_{k},w_{k+1}))\\|$
	$\displaystyle~{}~{}~{}+\\|T_{k+1}(u_{k},h(u_{k},w_{k+1}))-T_{k+1}(u_{k+1}^{},h(u_{k+1}^{},w_{k+1}))\\|$
	$\displaystyle~{}~{}~{}+\\|u_{k+1}^{}-u_{k}^{}\\|+\\|u_{k}^{*}-u_{k}\\|$
	$\displaystyle\leq\eta\ell_{x}\ell_{h_{u}}\\|x_{k+1}+e_{x,k+1}-h(u_{k},w_{k+1})\\|$
	$\displaystyle~{}~{}~{}+\\|T_{k+1}(u_{k},h(u_{k},w_{k+1}))-T_{k+1}(u_{k+1}^{},h(u_{k+1}^{},w_{k+1}))\\|$
	$\displaystyle~{}~{}~{}+\\|u_{k+1}^{}-u_{k}^{}\\|+\\|u_{k}^{*}-u_{k}\\|$
	$\displaystyle\leq\eta\ell_{x}\ell_{h_{u}}\\|x_{k+1}-h(u_{k},w_{k+1})\\|+\eta\ell_{x}\ell_{h_{u}}\\|e_{x,k+1}\\|$
	$\displaystyle~{}~{}~{}+\\|T_{k+1}(u_{k},h(u_{k},w_{k+1}))-T_{k+1}(u_{k+1}^{},h(u_{k+1}^{},w_{k+1}))\\|$
	$\displaystyle~{}~{}~{}+\\|u_{k+1}^{}-u_{k}^{}\\|+\\|u_{k}^{*}-u_{k}\\|.$

Consider the first term, $\|x_{k+1}-h(u_{k},w_{k+1})\|$ . We further bound this by applying Assumption 3 and by using the fact that $W:=\sqrt{V}$ :

	$\displaystyle\\|x_{k+1}-h(u_{k},w_{k+1})\\|$	$\displaystyle\leq\frac{1}{\sqrt{d_{1}}}W(x_{k+1},u_{k},w_{k+1})$
		$\displaystyle\leq\frac{1}{\sqrt{d_{1}}}\left(c_{w}W(x_{k},u_{k},w_{k})+\sqrt{\tau}\sigma_{w}^{\prime}(\bar{w})\right).$

To bound the second line of the final inequality, recall that

\displaystyle\|T_{k+1}

\displaystyle(u,h(u,w_{k+1}))-T_{k+1}(y,h(y,w_{k+1}))\|\leq c_{P}\|u-y\|

holds for any $u,y\in\mathcal{U}$ due to Assumption 3. Thus,

	$\displaystyle\\|T_{k+1}$	$\displaystyle(u_{k},h(u_{k},w_{k+1}))-T_{k+1}(u_{k+1}^{},h(u_{k+1}^{},w_{k+1}))\\|$
		$\displaystyle\leq c_{P}\\|u_{k}-u_{k+1}^{*}\\|$
		$\displaystyle\leq c_{P}\left(\\|u_{k+1}^{}-u_{k}^{}\\|+\\|u_{k}^{*}-u_{k}\\|\right).$

In total,

\displaystyle\begin{split}\|u_{k+1}&-u_{k}\|\leq\frac{\eta\ell_{x}\ell_{h_{u}}}{\sqrt{d_{1}}}\left(c_{w}W(x_{k},u_{k},w_{k})+\sqrt{\tau}\sigma_{w}^{\prime}(\bar{w})\right)\\ &~{}~{}~{}+\eta\ell_{x}\ell_{h_{u}}\|e_{x,k+1}\|\\ &~{}~{}~{}+(c_{P}+1)\left(\|u_{k+1}^{*}-u_{k}^{*}\|+\|u_{k}^{*}-u_{k}\|\right).\end{split}

(17)

(Part 1.c: Bound for $\|u_{k+1}-u_{k+1}^{*}\|$ ) We obtain the following bound by using similar steps as in the earlier computation to bound $\|u_{k+1}-u_{k}\|.$ Calculate,

	$\displaystyle\\|$	$\displaystyle u_{k+1}-u_{k+1}^{}\\|=\\|T_{k+1}(u_{k},\hat{x}_{k+1})-u_{k+1}^{}\\|$
		$\displaystyle=\\|T_{k+1}(u_{k},\hat{x}_{k+1})-T_{k+1}(u_{k},h(u_{k},w_{k+1}))$
		$\displaystyle~{}~{}+T_{k+1}(u_{k},h(u_{k},w_{k+1}))-u_{k+1}^{*}\\|$
		$\displaystyle\leq\\|T_{k+1}(u_{k},\hat{x}_{k+1})-T_{k+1}(u_{k},h(u_{k},w_{k+1}))\\|$
		$\displaystyle~{}~{}+\\|T_{k+1}(u_{k},h(u_{k},w_{k+1}))-T_{k+1}(u_{k+1}^{},h(u_{k+1}^{},w_{k+1}))\\|$
		$\displaystyle\leq\frac{\eta\ell_{x}\ell_{h_{u}}}{\sqrt{d_{1}}}\left(c_{w}W(x_{k},u_{k},w_{k})+\sqrt{\tau}\sigma_{w}^{\prime}(\bar{w})\right)$
		$\displaystyle~{}~{}~{}+\eta\ell_{x}\ell_{h_{u}}\\|e_{x,k+1}\\|+c_{P}\left(\\|u_{k+1}^{}-u_{k}^{}\\|+\\|u_{k}^{*}-u_{k}\\|\right).$

(Part 1.d: Overall bounds) Putting together the bounds above, we have:


	$\displaystyle\begin{split}W_{k+1}&\leq c_{w}\left(1+c_{w}\eta\ell_{x}\ell_{h_{u}}^{2}\right)W_{k}+\sqrt{\tau}(1+c_{w}\eta\ell_{x}\ell_{h_{u}}^{2})\sigma_{w}^{\prime}(\bar{w})\\ &~{}~{}+c_{w}\ell_{h_{u}}\sqrt{d_{1}}(c_{P}+1)\\|u_{k}-u_{k}^{}\\|\\ &~{}~{}+c_{w}\eta\sqrt{d_{1}}\ell_{x}\ell_{h_{u}}\\|e_{x,k+1}\\|\\ &~{}~{}+c_{w}\ell_{h_{u}}\sqrt{d_{1}}(c_{P}+1)\\|u_{k+1}^{}-u_{k}^{*}\\|.\end{split}$			(18a)
	$\displaystyle\begin{split}\\|u_{k+1}&-u_{k+1}^{}\\|\leq\frac{\eta\ell_{x}\ell_{h_{u}}}{\sqrt{d_{1}}}\left(c_{w}W(x_{k},u_{k},w_{k})+\sqrt{\tau}\sigma_{w}^{\prime}(\bar{w})\right)\\ &~{}~{}~{}+\eta\ell_{x}\ell_{h_{u}}\\|e_{x,k+1}\\|\\ &~{}~{}~{}+c_{P}(\\|u_{k+1}^{}-u_{k}^{}\\|+\\|u_{k}^{}-u_{k}\\|).\end{split}$			(18b)

Based on these intermediate results, conditions for ISS are derived next.

(Part 2: Sufficient conditions for stability) Define:

\displaystyle\omega_{k}:=\begin{bmatrix}\|u_{k}-u_{k}^{*}\|\\ W_{k}\end{bmatrix},\nu_{k}:=\begin{bmatrix}\|u_{k+1}^{*}-u_{k}^{*}\|\\ \|e_{x,k+1}\|\end{bmatrix},\sigma_{k}:=\begin{bmatrix}0\\ \sigma_{w}^{\prime}(\bar{w})\end{bmatrix}.

Then, rewrite (18) as,

\displaystyle\omega_{k+1}\leq M_{1}\omega_{k}+M_{2}\nu_{k}+M_{3}\sigma_{k}.

(19)

By applying (19) to itself $k+1$ times, we obtain:

	$\displaystyle\omega_{k+1}$	$\displaystyle\leq(M_{1})^{k+1}\omega_{0}$		(20)
		$\displaystyle~{}~{}~{}+\sum_{s=0}^{k+1}(M_{1})^{k+1-s}M_{2}\nu_{s}+\sum_{s=0}^{k+1}(M_{1})^{k+1-s}M_{3}\sigma_{s}.$		(21)

Next, recall from (Jiang and Wang, 2001, Ex. 3.4) that for a Schur matrix $M_{1}$ , there exists constants $r_{M_{1}}>0$ and $c_{M_{1}}\in[0,1)$ such that $\|(M_{1})^{k}\|\leq r_{M_{1}}c_{M_{1}}^{k}$ . By direct computation of the eigenvalues of $M_{1}$ , we may enforce $M_{1}$ to be Schur when $\tau>0$ satisfies:

\displaystyle\tau>\frac{1}{d_{3}}\log\left(\frac{d_{2}}{d_{1}}\right).

(22)

This condition critically relies on the the fact that $\eta>0$ is chosen so that $c_{P}\in(0,1)$ . Then, taking the norm on both sides (since the quantities are non-negative), and using the triangle inequality, one gets:

\displaystyle\begin{split}\|\omega_{k+1}\|&\leq r_{M_{1}}(c_{M_{1}})^{k+1}\|\omega_{0}\|\\ &+r_{M_{1}}\sum_{s=0}^{k+1}(c_{M})^{k+1-s}\|M_{2}\|\|\nu_{s}\|\\ &+r_{M_{1}}\sum_{s=0}^{k+1}(c_{M})^{k+1-s}\|M_{3}\|\|\sigma_{s}\|.\end{split}

(23)

Define the following:


$\displaystyle\bar{\nu}$	$\displaystyle:=\sup_{0\leq s\leq k}\\|\nu_{s}\\|,\,\,\bar{M}_{2}:=\\|M_{2}\\|,$	(24a)
$\displaystyle\bar{\sigma}$	$\displaystyle:=\sup_{0\leq s\leq k}\\|\sigma_{s}\\|,\,\,\bar{M}_{3}:=\\|M_{3}\\|.$	(24b)

Then, (23) becomes,

\displaystyle\begin{split}\|\omega_{k+1}\|&\leq r_{M_{1}}(c_{M_{1}})^{k+1}\|\omega_{0}\|+r_{M_{1}}\left(\sum_{s=1}^{k+1}(c_{M_{1}})^{s}\right)\bar{M}_{2}\|\bar{\nu}\|\\ &+r_{M_{1}}\left(\sum_{s=1}^{k+1}(c_{M_{1}})^{s}\right)\bar{M}_{3}\|\bar{\sigma}\|.\end{split}

(25)

Equivalently,

\displaystyle\begin{split}\|\omega_{k+1}\|&\leq r_{M_{1}}(c_{M_{1}})^{k+1}\|\omega_{0}\|\\ &+r_{M_{1}}c_{M_{1}}\left(\sum_{s=1}^{k}(c_{M_{1}})^{s}\right)\bar{M}_{2}\|\bar{\nu}\|\\ &+r_{M_{1}}c_{M_{1}}\left(\sum_{s=1}^{k}(c_{M_{1}})^{s}\right)\bar{M}_{3}\|\bar{\sigma}\|.\end{split}

(26)

Then, we may apply the geometric series on the second and third terms to obtain:

\displaystyle\begin{split}\|\omega_{k+1}\|&\leq r_{M_{1}}(c_{M_{1}})^{k+1}\|\omega_{0}\|\\ &+r_{M_{1}}\frac{c_{M_{1}}}{1+c_{M_{1}}}\left(\bar{M}_{2}\|\bar{\nu}\|+\bar{M}_{3}\|\bar{\sigma}\|\right).\end{split}

(27)

Then, finally apply the Lyaunov quadratic bound to obtain the final form. By Assumption 3, we may write:

\displaystyle m_{1}\left\lvert\left\lvert\begin{bmatrix}x_{k}-x_{k}^{*}\\ u_{k}-u_{k}^{*}\end{bmatrix}\right\lvert\right\lvert\leq\|\omega_{k}\|\leq m_{2}\left\lvert\left\lvert\begin{bmatrix}x_{k}-x_{k}^{*}\\ u_{k}-u_{k}^{*}\end{bmatrix}\right\lvert\right\lvert,

(28)

where

\displaystyle m_{1}:=\min\left\{1,\sqrt{d_{1}}\right\},\,\,m_{2}:=\max\left\{1,\sqrt{d_{2}}\right\}.

(29)

Substituting (28) into (27), we obtain:

\displaystyle\begin{split}&\left\lvert\left\lvert\begin{bmatrix}x_{k}-x_{k}^{*}\\ u_{k}-u_{k}^{*}\end{bmatrix}\right\lvert\right\lvert\leq\frac{r_{M_{1}}m_{2}}{m_{1}}(c_{M_{1}})^{k+1}\left\lvert\left\lvert\begin{bmatrix}x_{0}-x_{0}^{*}\\ u_{0}-u_{0}^{*}\end{bmatrix}\right\lvert\right\lvert\\ &~{}~{}+\frac{r_{M_{1}}c_{M_{1}}\bar{M}_{2}}{m_{1}(1+c_{M_{1}})}\left\lvert\left\lvert\begin{bmatrix}\sup_{0\leq s\leq k}\|u_{s}^{*}-u_{s-1}^{*}\|\\ \sup_{0\leq s\leq k}\|e_{x,s}\|\end{bmatrix}\right\lvert\right\lvert\\ &~{}~{}+\frac{r_{M_{1}}c_{M_{1}}\bar{M}_{3}}{m_{1}(1+c_{M_{1}})}\left\lvert\left\lvert\begin{bmatrix}0\\ \sigma_{w}(\bar{w})\end{bmatrix}\right\lvert\right\lvert.\end{split}

(30)

The bound then follows by noticing that:

	$\displaystyle\\|e_{x,k}\\|$	$\displaystyle=\\|\hat{x}_{k}-x_{k}\\|$
		$\displaystyle=\\|\hat{p}(\zeta_{k})-x_{k}\\|$
		$\displaystyle=\\|\hat{p}(\zeta_{k})-p(\zeta_{k})\\|+\\|p(q(x_{k}))-x_{k}\\|$

by using the fact that $\|p(q(x_{k}))-x_{k}\|\leq\varepsilon$ , and by bounding the first term with $\sup_{\zeta\in\mathcal{Q}}\|\hat{p}(\zeta)-p(\zeta)\|$ .

	$\displaystyle\\|z_{k}\\|$	$\displaystyle\leq\frac{r_{M_{1}}m_{2}}{m_{1}}c_{M_{1}}^{k+1}\\|z_{0}\\|+b\\|M_{3}\\|\sigma_{w}\left(\sup_{0\leq s\leq\tau k}\\|\dot{w}(s)\\|\right)$
		$\displaystyle~{}~{}~{}~{}~{}~{}+b\\|M_{2}\\|\left\lvert\left\lvert\begin{bmatrix}\sup_{1\leq s\leq k}\\|u_{s}^{}-u_{s-1}^{}\\|\\ \sup_{\zeta\in\mathcal{Q}}\\|\hat{p}(\zeta)-p(\zeta)\\|+\bar{\varepsilon}\end{bmatrix}\right\lvert\right\lvert,$		(11)

	$\displaystyle\\|z(t)\\|$	$\displaystyle\leq\beta(\\|z_{0}\\|,t)+\gamma_{w}\left(\sup_{0\leq s\leq t}\\|\dot{w}(s)\\|\right)+\gamma_{u}\left(\Delta_{u}^{*}\right)$
		$\displaystyle~{}~{}~{}~{}~{}~{}+\gamma_{p}\left(\sup_{\zeta\in\mathcal{Q}}\\|\hat{p}(\zeta)-p(\zeta)\\|+\bar{\varepsilon}\right)$		(12)

	$\displaystyle\\|u_{k+1}-u_{k}\\|=\\|T_{k+1}(u_{k},\hat{x}_{k+1})-u_{k}\\|$
	$\displaystyle=\\|T_{k+1}(u_{k},\hat{x}_{k+1})-T_{k+1}(u_{k+1}^{},h(u_{k+1}^{},w_{k+1}))$
	$\displaystyle~{}~{}~{}+u_{k+1}^{}-u_{k}^{}+u_{k}^{*}-u_{k}\\|$
	$\displaystyle=\\|T_{k+1}(u_{k},\hat{x}_{k+1})-T_{k+1}(u_{k},h(u_{k},w_{k+1}))$
	$\displaystyle~{}~{}~{}+T_{k+1}(u_{k},h(u_{k},w_{k+1}))-T_{k+1}(u_{k+1}^{},h(u_{k+1}^{},w_{k+1}))$
	$\displaystyle~{}~{}~{}+u_{k+1}^{}-u_{k}^{}+u_{k}^{*}-u_{k}\\|$
	$\displaystyle\leq\\|T_{k+1}(u_{k},\hat{x}_{k+1})-T_{k+1}(u_{k},h(u_{k},w_{k+1}))\\|$
	$\displaystyle~{}~{}~{}+\\|T_{k+1}(u_{k},h(u_{k},w_{k+1}))-T_{k+1}(u_{k+1}^{},h(u_{k+1}^{},w_{k+1}))\\|$
	$\displaystyle~{}~{}~{}+\\|u_{k+1}^{}-u_{k}^{}\\|+\\|u_{k}^{*}-u_{k}\\|$
	$\displaystyle\leq\eta\ell_{x}\ell_{h_{u}}\\|x_{k+1}+e_{x,k+1}-h(u_{k},w_{k+1})\\|$
	$\displaystyle~{}~{}~{}+\\|T_{k+1}(u_{k},h(u_{k},w_{k+1}))-T_{k+1}(u_{k+1}^{},h(u_{k+1}^{},w_{k+1}))\\|$
	$\displaystyle~{}~{}~{}+\\|u_{k+1}^{}-u_{k}^{}\\|+\\|u_{k}^{*}-u_{k}\\|$
	$\displaystyle\leq\eta\ell_{x}\ell_{h_{u}}\\|x_{k+1}-h(u_{k},w_{k+1})\\|+\eta\ell_{x}\ell_{h_{u}}\\|e_{x,k+1}\\|$
	$\displaystyle~{}~{}~{}+\\|T_{k+1}(u_{k},h(u_{k},w_{k+1}))-T_{k+1}(u_{k+1}^{},h(u_{k+1}^{},w_{k+1}))\\|$
	$\displaystyle~{}~{}~{}+\\|u_{k+1}^{}-u_{k}^{}\\|+\\|u_{k}^{*}-u_{k}\\|.$

	$\displaystyle\\|T_{k+1}$	$\displaystyle(u_{k},h(u_{k},w_{k+1}))-T_{k+1}(u_{k+1}^{},h(u_{k+1}^{},w_{k+1}))\\|$
		$\displaystyle\leq c_{P}\\|u_{k}-u_{k+1}^{*}\\|$
		$\displaystyle\leq c_{P}\left(\\|u_{k+1}^{}-u_{k}^{}\\|+\\|u_{k}^{*}-u_{k}\\|\right).$

	$\displaystyle\\|$	$\displaystyle u_{k+1}-u_{k+1}^{}\\|=\\|T_{k+1}(u_{k},\hat{x}_{k+1})-u_{k+1}^{}\\|$
		$\displaystyle=\\|T_{k+1}(u_{k},\hat{x}_{k+1})-T_{k+1}(u_{k},h(u_{k},w_{k+1}))$
		$\displaystyle~{}~{}+T_{k+1}(u_{k},h(u_{k},w_{k+1}))-u_{k+1}^{*}\\|$
		$\displaystyle\leq\\|T_{k+1}(u_{k},\hat{x}_{k+1})-T_{k+1}(u_{k},h(u_{k},w_{k+1}))\\|$
		$\displaystyle~{}~{}+\\|T_{k+1}(u_{k},h(u_{k},w_{k+1}))-T_{k+1}(u_{k+1}^{},h(u_{k+1}^{},w_{k+1}))\\|$
		$\displaystyle\leq\frac{\eta\ell_{x}\ell_{h_{u}}}{\sqrt{d_{1}}}\left(c_{w}W(x_{k},u_{k},w_{k})+\sqrt{\tau}\sigma_{w}^{\prime}(\bar{w})\right)$
		$\displaystyle~{}~{}~{}+\eta\ell_{x}\ell_{h_{u}}\\|e_{x,k+1}\\|+c_{P}\left(\\|u_{k+1}^{}-u_{k}^{}\\|+\\|u_{k}^{*}-u_{k}\\|\right).$