This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Online Adversarial Stabilization of
Unknown Linear Time-Varying Systems

Jing Yu, Varun Gupta, and Adam Wierman This work was supported by Caltech/Amazon AWS AI4Science fellowships and the National Science Foundation under grants CNS-2146814, CPS-2136197, CNS-2106403, ECCS-2200692, and NGSDI-2105648. J. Yu and A. Wierman are with the Department of Computing and Mathematical Sciences, California Institute of Technology {jing, adamw}@caltech.edu. V. Gupta is with the Booth School of Business, University of Chicago guptav@uchicago.edu
Abstract

This paper studies the problem of online stabilization of an unknown discrete-time linear time-varying (LTV) system under bounded non-stochastic (potentially adversarial) disturbances. We propose a novel algorithm based on convex body chasing (CBC). Under the assumption of infrequently changing or slowly drifting dynamics, the algorithm guarantees bounded-input-bounded-output stability in the closed loop. Our approach avoids system identification and applies, with minimal disturbance assumptions, to a variety of LTV systems of practical importance. We demonstrate the algorithm numerically on examples of LTV systems including Markov linear jump systems with finitely many jumps.

I Introduction

Learning-based control of linear-time invariant (LTI) systems in the context of linear quadratic regulators (LQR) has seen considerable progress. However, many real-world systems are time-varying in nature. For example, the grid topology in power systems can change over time due to manual operations or unpredictable line failures [1]. Therefore, there is increasing recent interest in extending learning-based control of LTI systems to the linear time-varying (LTV) setting [2, 3, 4, 5, 6].

LTV systems are widely used to approximate and model real-world dynamical systems such as robotics [7] and autonomous vehicles [8]. In this paper, we consider LTV systems with dynamics of the following form:

xt+1=Atxt+Btut+wt,x_{t+1}=A_{t}x_{t}+B_{t}u_{t}+w_{t}, (1)

where xtnx_{t}\in\mathbb{R}^{n}, utmu_{t}\in\mathbb{R}^{m} and wtw_{t} denotes the state, the control input, and the bounded and potentially adversarial disturbance, respectively. We use θt=[AtBt]\theta_{t}=[A_{t}\ B_{t}] to succinctly denote the system matrices at time step tt.

On the one hand, offline control design for LTV systems is well-established in the setting where the underlying LTV model is known [9, 10, 11, 12, 13]. Additionally, recent work has started focusing on regret analysis and non-stochastic disturbances for known LTV systems [2, 14].

On the other hand, online control design for LTV systems where the model is unknown is more challenging. Historically, there is a rich body of work on adaptive control design for LTV systems [15, 16, 17]. Also related is the system identification literature for LTV systems [18, 19, 20], which estimates the (generally assumed to be stable) system to allow the application of the offline techniques.

In recent years, the potential to leverage modern data-driven techniques for controller design of unknown linear systems has led to a resurgence of work in both the LTI and LTV settings. There is a growing literature on “learning to control” unknown LTI systems under stochastic or no noise [21, 22, 23]. Learning under bounded and potentially adversarial noises poses additional challenges, but online stabilization [24] and regret [25] results have been obtained.

In comparison, there is much less work on learning-based control design for unknown LTV systems. One typical approach, exemplified by [3, 26, 27], derives stabilizing controllers under the assumption that offline data representing the input-output behavior of (1) is available and therefore an offline stabilizing controller can be pre-computed. Similar finite-horizon settings where the algorithm has access to offline data [28], or can iteratively collect data [29] were also considered. In the context of online stabilization, i.e., when offline data is not available, work has derived stabilizing controllers for LTV systems through the use of predictions of θt\theta_{t}, e.g., [30]. Finally, another line of work focuses on designing regret-optimal controllers for LTV systems [31, 6, 4, 5, 32]. However, with the exception of [30], existing work on online control of unknown LTV systems share the common assumption of either of open-loop stability or knowledge of an offline stabilizing controller. Moreover, the disturbances are generally assumed to be zero or stochastic noise independent of the states and inputs.

In this paper, we propose an online algorithm for stabilizing unknown LTV systems under bounded, potentially adversarial disturbances. Our approach uses convex body chasing (CBC), which is an online learning problem where one must choose a sequence of points within sequentially presented convex sets with the aim of minimizing the sum of distances between the chosen points [33, 34]. CBC has emerged as a promising tool in online control, with most work making connections to a special case called nested convex body chasing (NCBC), where the convex sets are sequentially nested within the previous set [35, 36]. In particular, [37] first explored the use of NCBC for learning-based control of time-invariant nonlinear systems. NCBC was also used in combination with System Level Synthesis to design a distributed controller for networked systems [24] and in combination with model predictive control [38] for LTI system control as a promising alternative to system identification based methods. However, this line of work depends fundamentally on the time invariance of the system, which results in nested convex sets. LTV systems do not yield nested sets and therefore represent a significant challenge.

This work addresses this challenge and presents a novel online control scheme (Algorithm 1) based on CBC (non-nested) techniques that guarantees bounded-input-bounded-output (BIBO) stability as a function of the total model variation t=1θtθt1\sum_{t=1}^{\infty}\left\|\theta_{t}-\theta_{t-1}\right\|, without predictions or offline data under bounded and potentially adversarial disturbances for unknown LTV systems (Theorem 1). This result implies that when the total model variation is finite or growing sublinearly, BIBO stability of the closed loop is guaranteed (Corollaries 1 and 2). In particular, our result depends on a refined analysis of the CBC technique (Lemma 1) and is based on the perturbation analysis of the Lyapunov equation. This contrasts with previous NCBC-based works for time-invariant systems, where the competitive ratio guarantee of NCBC directly applies and the main technical tool is the robustness of the model-based controller, which is a proven using a Lipschitz bound of a quadratic program in [24] and is directly assumed to exist in [37].

We illustrate the proposed algorithm via numerical examples in Section IV to corroborate the stability guarantees. We demonstrate how the proposed algorithm can be used for data collection and complement data-driven methods like [27, 3, 28]. Further, the numerics highlight that the proposed algorithm can be efficiently implemented by leveraging the linearity of (1) despite the computational complexity of CBC algorithms in general (see Section III-B for details).

Notation. We use 𝕊n1\mathbb{S}^{n-1} to denote the unit sphere in n\mathbb{R}^{n} and +\mathbb{N}_{+} for positive integers. For t,s+t,\,s\in\mathbb{N}_{+}, we use [t:s][t:s] as shorthand for the set of integers {t,t+1,,s}\{t,\,t+1,\,\ldots,s\} and [t][t] for {1, 2,,t}\{1,\,2,\,\ldots,t\}. Unless otherwise specified, \left\|\cdot\right\| is the operator norm. We use ρ()\rho(\cdot) for the spectral radius of a matrix.

II Preliminaries

In this section, we state the model assumptions underlying our work and review key results for convex body chasing, which we leverage in our algorithm design and analysis.

II-A Stability and model assumptions

We study the dynamics in (1) and make the following standard assumptions about the dynamics.

Assumption 1

The disturbances are bounded: wtW\left\|w_{t}\right\|_{\infty}\leq W for all t0t\geq 0.

Assumption 2

The unknown time-varying system matrices {θt}t=1\{\theta_{t}\}_{t=1}^{\infty} belong to a known (potentially large) polytope Θ\Theta such that θtΘ\theta_{t}\in\Theta for all tt. Moreover, there exists κ>0\kappa>0 such that θκ\left\|\theta\right\|\leq\kappa and θ\theta is stabilizable for all θΘ\theta\in\Theta.

Bounded and non-stochastic (potentially adversarial) disturbances is a common model both in the online learning and control problems [39, 40]. Since we make no assumptions on how large the bound WW is, 1 models a variety of scenarios, such as bounded and/or correlated stochastic noise, state-dependent disturbances, e.g., the linearization and discretization error for nonlinear continuous-time dynamics, and potentially adversarial disturbances. 2 is standard in learning-based control, e.g. [41, 42].

We additionally assume there is a quadratic known cost function of the state and control input at every time step tt to be minimized, e.g. xtQxt+utRutx_{t}^{\top}Qx_{t}+u_{t}^{\top}Ru_{t}, with Q,R0Q,\,R\succ 0. For a given LTI system model θ=[AB]\theta=[A\ B] and cost matrices Q,RQ,\,R, we denote K=LQR(θ;Q,R)K=\textsc{LQR}(\theta;Q,R) as the optimal feedback gain for the corresponding infinite-horizon LQR problem.

Remark 1

Representing model uncertainty as convex compact parameter sets where every model is stabilizable is not always possible. In particular, if a parameter set Θ\Theta has a few singular points where (A,B)(A,B) loses stabilizability such as when B=0B=0, a simple heuristic is to ignore these points in the algorithm since we assume the underlying true system matrices θt\theta_{t} must be stabilizable.

II-B Convex body chasing

Convex Body Chasing (CBC) is a well-studied online learning problem [35, 36]. At every round t+t\in\mathbb{N}_{+}, the player is presented a convex body/set 𝒦tn\mathcal{K}_{t}\subset\mathbb{R}^{n}. The player selects a point qt𝒦tq_{t}\in\mathcal{K}_{t} with the objective of minimizing the cost defined as the total path length of the selection for TT rounds, e.g., t=1Tqtqt1\sum_{t=1}^{T}\|q_{t}-q_{t-1}\| for a given initial condition q0𝒦1q_{0}\not\in\mathcal{K}_{1}. There are many known algorithms for the CBC problem with a competitive ratio guarantee such that the cost incurred by the algorithm is at most a constant factor from the total path length incurred by the offline optimal algorithm which has the knowledge of the entire sequence of the bodies. We will use CBC to select θt\theta_{t}’s that are consistent with observed data.

II-B1 The nested case

A special case of CBC is the nested convex body chasing (NCBC) problem, where 𝒦t𝒦t1\mathcal{K}_{t}\subseteq\mathcal{K}_{t-1}. A known algorithm for NCBC is to select the Steiner point of 𝒦t\mathcal{K}_{t} at tt [36]. The Steiner point of a convex set 𝒦\mathcal{K} can be interpreted as the average of the extreme points of 𝒦\mathcal{K} and is defined as st(𝒦):=𝔼v:v1[g𝒦(v)]\textsf{st}(\mathcal{K}):=\mathbb{E}_{v:\|v\|\leq 1}\left[g_{\mathcal{K}}(v)\right], where g𝒦(v):=argmaxx𝒦vxg_{\mathcal{K}}(v):=\text{argmax}_{x\in\mathcal{K}}v^{\top}x and the expectation is taken with respect to the uniform distribution over the unit ball. The intuition is that Steiner point remains “deep” inside of the (nested) feasible region so that when this point becomes infeasible due to a new convex set, this convex set must shrink considerably, which indicates that the offline optimal must have moved a lot. Given the initial condition q0𝒦1q_{0}\not\in\mathcal{K}_{1}, the Steiner point selector achieves competitive ratio of 𝒪(n)\mathcal{O}(n) against the offline optimal such that for all T+T\in\mathbb{N_{+}}, t=1Tst(𝒦t)st(𝒦t1)𝒪(n)OPT\sum_{t=1}^{T}\|\textsf{st}(\mathcal{K}_{t})-\textsf{st}(\mathcal{K}_{t-1})\|\leq\mathcal{O}(n)\cdot\text{OPT}, where OPT is the offline optimal total path length. There are many works that combine the Steiner point algorithm for NCBC with existing control methods to perform learning-based online control for LTI systems, e.g., [24, 37, 38].

II-B2 General CBC

For general CBC problems, we can no longer take advantage of the nested property of the convex bodies. One may consider naively applying NCBC algorithms when the convex bodies happen to be nested and restarting the NCBC algorithm when they are not. However, due to the myopic nature of NCBC algorithms, which try to remain deep inside of each convex set, they no longer guarantee a competitive ratio when used this way. Instead, [33] generalizes ideas from NCBC and proposes an algorithm that selects the functional Steiner point of the work function.

Definition 1 (Functional Steiner point)

For a convex function f:nf:\mathbb{R}^{n}\to\mathbb{R}, the functional Steiner point of ff is

st(f)=n\ThisStyle\ensurestackMath\stackinsetc.2\LMptc.5\LMpt\SavedStyle\SavedStylev:v=1f(v)v𝑑v,\textsf{st}(f)=-n\cdot\,\ThisStyle{\ensurestackMath{\stackinset{c}{.2\LMpt}{c}{.5\LMpt}{\SavedStyle-}{\SavedStyle\phantom{\int}}}\kern-16.56937pt}\int_{v:\left\|v\right\|=1}f^{*}(v)\,v\,dv, (2)

where \ThisStyle\ensurestackMath\stackinsetc.2\LMptc.5\LMpt\SavedStyle\SavedStylex𝒮f(x)𝑑x\,\ThisStyle{\ensurestackMath{\stackinset{c}{.2\LMpt}{c}{.5\LMpt}{\SavedStyle-}{\SavedStyle\phantom{\int}}}\kern-16.56937pt}\int_{x\in\mathcal{S}}f(x)dx denotes the normalized value x𝒮f(x)𝑑xx𝒮1𝑑x\frac{\int_{x\in\mathcal{S}}f(x)dx}{\int_{x\in\mathcal{S}}1dx} of f(x)f(x) on the set 𝒮\mathcal{S}, and

f(v):=infxnf(x)x,vf^{*}(v):=\text{inf}_{x\in\mathbb{R}^{n}}f(x)-\left\langle x,v\right\rangle (3)

is the Fenchel conjugate of ff.

The CBC algorithm selects the functional Steiner point of the work function, which records the smallest cost required to satisfy a sequence of requests while ending in a given state, thereby encapsulating information about the offline-optimal cost for the CBC problem.

Definition 2 (Work function)

Given an initial point q0nq_{0}\in\mathbb{R}^{n}, and convex sets 𝒦1,,𝒦tn\mathcal{K}_{1},\ldots,\mathcal{K}_{t}\subset\mathbb{R}^{n}, the work function at time step tt evaluated at a point xnx\in\mathbb{R}^{n} is given by:

ωt(x)=minqs𝒦sxqt+s=1tqsqs1.\omega_{t}(x)=\min_{q_{s}\in\mathcal{K}_{s}}\left\|x-q_{t}\right\|+\sum_{s=1}^{t}\left\|q_{s}-q_{s-1}\right\|. (4)

Importantly, it is shown that the functional Steiner points of the work functions are valid, i.e., st(ωt)𝒦t\textsf{st}(\omega_{t})\in\mathcal{K}_{t} for all tt [33]. On a high level, selecting the functional Steiner point of the work function helps the algorithm stay competitive against the currently estimated offline optimal cost via the work function, resulting in a competitive ratio of nn against the offline optimal cost (OPT) for general CBC problems,

t=1Tst(ωt)st(ωt+1)nOPT.\sum_{t=1}^{T}\|\textsf{st}(\omega_{t})-\textsf{st}(\omega_{t+1})\|\leq n\cdot\text{OPT}. (5)

Given the non-convex nature of (2) and (4), we note that, in general, it is challenging to compute the functional Steiner point of the work function. However, in the proposed algorithm, we are able to leverage the linearity of the LTV systems and numerically approximate both objects with efficient computation in Section III-B.

III Main Results

We present our proposed online control algorithm to stabilize the unknown LTV system (1) under bounded and potentially adversarial disturbances in Algorithm 1. After observing the latest transition from xt,utx_{t},\,u_{t} to xt+1x_{t+1} at t+1t+1 according to (1) (line 1), the algorithm constructs the set of all feasible models θ^t\widehat{\theta}_{t}’s (line 1) such that the model is consistent with the observation, i.e., there exists an admissible disturbance w^t\widehat{w}_{t} satisfying 1 such that the state transition from xt,utx_{t},\,u_{t} to xt+1x_{t+1} can be explained by the tuple (θ^t\widehat{\theta}_{t}, w^t\widehat{w}_{t}). We call this set the consistent model set 𝒫t\mathcal{P}_{t} and we note that the unknown true dynamics θt=[AtBt]\theta_{t}=[A_{t}\ B_{t}] belongs to 𝒫t\mathcal{P}_{t}. The algorithm then selects a hypothesis model out of the consistent model set 𝒫t\mathcal{P}_{t} using the CBC algorithm by computing the functional Steiner point (2) of the work function (4) with respect to the history of the consistent parameter sets 𝒫1,,𝒫t\mathcal{P}_{1},\,\ldots,\,\mathcal{P}_{t} (line 1). In particular, we present an efficient implementation of the functional Steiner point chasing algorithm in Section III-B by taking advantage of the fact that 𝒫t\mathcal{P}_{t}’s are polytopes that can be described by intersection of half-spaces. The implementation is summarized in Algorithm 2. Based on the selected hypothesis model θ^t\widehat{\theta}_{t}, a certainty-equivalent LQR controller is synthesized (line 1) and the state-feedback control action is computed (line 1).

Note that, by construction, at time step t+t\in\mathbb{N}_{+} we perform certainty-equivalent control K^t1\widehat{K}_{t-1} based on a hypothesis model θ^t1\widehat{\theta}_{t-1} computed using retrospective data, even though the control action (ut=K^t1xtu_{t}=\widehat{K}_{t-1}x_{t}) is applied to the dynamics (θt\theta_{t}) that we do not yet have any information about. In order to guarantee stability, we would like for K^t1\widehat{K}_{t-1} to be stabilizing the “future” dynamics (θt\theta_{t}). This is the main motivation behind our choice of the CBC technique instead of regression-based techniques for model selection. Thanks to the competitive ratio guarantee (5) of the functional Steiner point selector, when the true model variation is “small,” our previously selected hypothesis model will stay “consistent” in the sense that K^t1\widehat{K}_{t-1} can be stabilizing for θt\theta_{t} despite the potentially adversarial or state-dependent disturbances. On the other hand, when the true model variation is “large,” K^t1\widehat{K}_{t-1} does not stabilize θt\theta_{t}, and we see growth in the state norm. Therefore, our final state bound is in terms of the total variation of the true model.

We show in the next section that, by drawing connections between the stability of the closed-loop system and the path length cost of the selected hypothesis model via CBC, we are able to stabilize the unknown LTV system without any identification requirements, e.g., the selected hypothesis models in Algorithm 1 need not be close to the true models. It is observed that even in the LTI setting, system identification can result in large-norm transient behaviors with numerical stability issues if the underlying unknown system is open-loop unstable or under non-stochastic disturbances; thus motivating the development of NCBC-based online control methods [25, 24, 37]. In the LTV setting, it is not sufficient to use NCBC ideas due to the time-variation of the model; however, the intuition for the use of CBC is similar. In fact, it can be additionally beneficial to bypass identification in settings where the true model is a moving target, thus making identification more challenging. We illustrate this numerically in Section IV.

Input: W>0W>0, Θn×(n+m)\Theta\subset\mathbb{R}^{n\times(n+m)}
Initialize : u0=0u_{0}=0, θ^0Θ\widehat{\theta}_{0}\in\Theta
1 for t+1=1,2,t+1=1,2,\ldots do
2       Observe xt+1x_{t+1}
3       Construct consistent set 𝒫t:={θ=[A,B]:xt+1AxtButW}Θ\mathcal{P}_{t}:=\left\{\theta=[A,B]:\left\|x_{t+1}-Ax_{t}-Bu_{t}\right\|_{\infty}\leq W\right\}\cap\Theta
4       Select hypothesis model θ^tCBC({𝒫s}s=1t;θ^0)\widehat{\theta}_{t}\leftarrow\textsc{CBC}(\{\mathcal{P}_{s}\}_{s=1}^{t};\widehat{\theta}_{0})
5       Synthesize controller K^tLQR(θ^t;Q,R)\widehat{K}_{t}\leftarrow\textsc{LQR}\left(\widehat{\theta}_{t};Q,R\right)
6       Compute feedback control input ut+1=K^txt+1u_{t+1}=\widehat{K}_{t}x_{t+1}
7 end for
Algorithm 1 Unknown LTV stabilization
Input: 𝒫1\mathcal{P}_{1}, \ldots, 𝒫t\mathcal{P}_{t}, θ^0\widehat{\theta}_{0}, NN
Output: θ^t\widehat{\theta}_{t}
1 for k=0,1,Nk=0,1,\ldots N do
2       Sample viv_{i} uniformly from 𝕊n1\mathbb{S}^{n-1}
3       hi(12)h_{i}\leftarrow\eqref{eq:socp}
4      
5 end for
θ^tprojΘ𝒫t(nNi=1Nhivi)\widehat{\theta}_{t}\leftarrow\textsf{proj}_{\Theta\cap\mathcal{P}_{t}}\left(-\frac{n}{N}\sum_{i=1}^{N}h_{i}v_{i}\right)
Algorithm 2 CBC

III-A Stability Analysis

The main result of this paper is the BIBO stability guarantee for Algorithm 1 in terms of the true model variation and the disturbance bound. We sketch the proof in this section and refer Section -C for the formal proof. This result depends on a refined analysis of the competitive ratio for the functional Steiner point chasing algorithm introduced in [33], which is stated as follows.

Lemma 1 (Partial-path competitive ratio)

For t+t\in\mathbb{N}_{+}, let s,e[t]s,\,e\in[t] and s<es<e, and let Θn\Theta\subset\mathbb{R}^{n} be a convex compact set. Denote Δ^[s,e]:=τ=s+1est(ωτ)st(ωτ1)F\widehat{\Delta}_{[s,e]}:=\sum_{\tau=s+1}^{e}\left\|\textsf{st}(\omega_{\tau})-\textsf{st}(\omega_{\tau-1})\right\|_{F} as the partial-path cost of the functional Steiner point selector during interval [s,e][s,e] and {OPTτ}τ=1t\{\textsc{OPT}_{\tau}\}_{\tau=1}^{t} as the (overall) offline optimal selection for 𝒦1,,𝒦tΘ\mathcal{K}_{1},\,\ldots,\,\mathcal{K}_{t}\subset\Theta. The functional Steiner point chasing algorithm has the following competitive ratio,

Δ^[s,e]\displaystyle\widehat{\Delta}_{[s,e]} n(dia(Θ)+2κ+τ=s+1eOPTτOPTτ1F).\displaystyle\leq n\left(\textsf{dia}(\Theta)+2\kappa+\sum_{\tau=s+1}^{e}\left\|\textsc{OPT}_{\tau}-\textsc{OPT}_{\tau-1}\right\|_{F}\right).

on interval [s,e][s,e], where dia(Θ):=maxθ1,θ2Θθ1θ2F\textsf{dia}(\Theta):=\max_{\theta_{1},\,\theta_{2}\in\Theta}\left\|\theta_{1}-\theta_{2}\right\|_{F} denotes the diameter of Θ\Theta and κ:=maxθΘθF\kappa:=\max_{\theta\in\Theta}\left\|\theta\right\|_{F}.

Proof.

See Section -A. ∎

Theorem 1 (BIBO Stability)

Under Assumption 1 and 2, the closed loop of (1) under Algorithm 1 is BIBO stable such that for all t0t\geq 0,

xtWc1s=0t2c2Δ[s,t1]ρLts\left\|x_{t}\right\|\leq W\cdot c_{1}\sum_{s=0}^{t-2}c_{2}^{\Delta_{[s,t-1]}}\rho_{L}^{t-s}

where Δ[s,t1]:=τ=s+1t1θτθτ1F{\Delta}_{[s,t-1]}:=\sum_{\tau=s+1}^{t-1}\left\|{\theta}_{\tau}-{\theta}_{\tau-1}\right\|_{F} is the true model variation, WW is the disturbance bound, and c1,c2>0,ρL(0,1)c_{1},\,c_{2}>0,\,\rho_{L}\in(0,1) are constants that depend on the system-theoretical quantities of the worst-case model in the parameter set Θ\Theta.

Proof Sketch: At a high level, the structure of our proof is as follows. We first use the fact that our time-varying feedback gain K^t\widehat{K}_{t} is computed according to a hypothesis model from the consistent model set. Therefore, we can characterize the closed-loop dynamics in terms of the consistent models θ^t\widehat{\theta}_{t} and K^t\widehat{K}_{t}. Specifically, consider a time step tt where we take the action ut=K^t1xtu_{t}=\widehat{K}_{t-1}x_{t} after observing xtx_{t}. Then, we observe xt+1=Atxt+Btut+wtx_{t+1}=A_{t}x_{t}+B_{t}u_{t}+w_{t} and select a new hypothesis model θ^t=[A^tB^t]\widehat{\theta}_{t}=[\widehat{A}_{t}\ \widehat{B}_{t}] that is consistent with this new observation. Since we have selected a consistent hypothesis model, there is some admissible disturbance w^t\widehat{w}_{t} satisfying 1 such that

xt+1\displaystyle x_{t+1} =(At+BtK^t1)xt+wt=(A^t+B^tK^t1)xt+w^t.\displaystyle=\left(A_{t}+B_{t}\widehat{K}_{t-1}\right)x_{t}+w_{t}=\left(\widehat{A}_{t}+\widehat{B}_{t}\widehat{K}_{t-1}\right)x_{t}+\widehat{w}_{t}.

Without loss of generality, we assume initial condition x0=0x_{0}=0. We therefore have

xt=w^t1+s=0t2τ[t1:s+1](A^τ+B^τK^τ1)w^s.\displaystyle x_{t}=\widehat{w}_{t-1}+\sum_{s=0}^{t-2}\prod_{\tau\in[t-1:s+1]}\left(\widehat{A}_{\tau}+\widehat{B}_{\tau}\widehat{K}_{\tau-1}\right)\widehat{w}_{s}. (6)

We have two main challenges in bounding xt\left\|x_{t}\right\| in (6):

  1. 1.

    K^t\widehat{K}_{t} is computed using θ^t\widehat{\theta}_{t} in Algorithm 1, but is applied to the next time step θ^t+1\widehat{\theta}_{t+1}. While we know ρ(A^t+B^tK^t)<1\rho(\widehat{A}_{t}+\widehat{B}_{t}\widehat{K}_{t})<1, in (6) we have K^t1\widehat{K}_{t-1} instead of K^t\widehat{K}_{t}.

  2. 2.

    Naively applying submultiplicativity of the operator norm for (6) results in bounding (A^τ+B^τK^τ1)\left\|\left(\widehat{A}_{\tau}+\widehat{B}_{\tau}\widehat{K}_{\tau-1}\right)\right\|. However, even if K^t1\widehat{K}_{t-1} satisfies ρ(A^t+B^tK^t)<1\rho(\widehat{A}_{t}+\widehat{B}_{t}\widehat{K}_{t})<1, in general the operator norm can be greater than 1.

To address the first challenge, our key insight is that by selecting hypothesis models via CBC technique, in any interval where the true model variation is small, our selected hypothesis model also vary little. Specifically, by Lemma 1, we can bound the partial-path variation of the selected hypothesis models with the true model partial-path variation Δ[s,e]\Delta_{[s,e]} as follows.

Δ^[s,e]\displaystyle\widehat{\Delta}_{[s,e]} n(dia(Θ)+2κ+τ=se1OPTτ+1OPTτF)\displaystyle\leq n\left(\textsf{dia}(\Theta)+2\kappa+\sum_{\tau=s}^{e-1}\left\|\textsc{OPT}_{\tau+1}-\textsc{OPT}_{\tau}\right\|_{F}\right)
n(dia(Θ)+2κ+Δ[s,e]).\displaystyle\leq n\left(\textsf{dia}(\Theta)+2\kappa+\Delta_{[s,e]}\right). (7)

where Θ\Theta and κ\kappa are from 2. A consequence of (7) is that, during intervals where the true model variation is small, we have (A^t+B^tK^t1)(A^t+B^tK^t)\left(\widehat{A}_{t}+\widehat{B}_{t}\widehat{K}_{t-1}\right)\approx\left(\widehat{A}_{t}+\widehat{B}_{t}\widehat{K}_{t}\right).

For the second challenge, we leverage the concept of sequential strong stability [43], which allows bounding τ[t1:s+1](A^τ+B^τK^τ1)\left\|\prod_{\tau\in[t-1:s+1]}\left(\widehat{A}_{\tau}+\widehat{B}_{\tau}\widehat{K}_{\tau-1}\right)\right\| approximately with τ[t1:s+1]ρ(A^τ+B^τK^τ)\prod_{\tau\in[t-1:s+1]}\rho\left(\widehat{A}_{\tau}+\widehat{B}_{\tau}\widehat{K}_{\tau}\right) times 𝒪(exp(Δ[s,t1]))\mathcal{O}\left(\exp(\Delta_{[s,t-1]})\right).

We now sketch the proof. The helper lemmas are summarized in Section -B and the formal proof can be found in Section -C. Consider Lt,Htn×nL_{t},\,H_{t}\in\mathbb{R}^{n\times n} with Ht0H_{t}\succ 0 such that

A^t+B^tK^t1\displaystyle\widehat{A}_{t}+\widehat{B}_{t}\widehat{K}_{t-1} :=Ht1/2LtHt1/2.\displaystyle:=H_{t}^{1/2}L_{t}H_{t}^{-1/2}.

We use IsI_{s} as shorthand for the interval [t1:s+1][t-1:s+1]. Then each summand in (6) can be bounded as

τIs(A^τ+B^τK^τ1)\displaystyle\left\|\prod_{\tau\in I_{s}}\left(\widehat{A}_{\tau}+\widehat{B}_{\tau}\widehat{K}_{\tau-1}\right)\right\|
Ht11/2Hs+11/2(a)kIs+1Hk1/2Hk11/2(b)τIsLτ(c)\displaystyle\leq\underbrace{\left\|H_{t-1}^{1/2}\right\|\left\|H_{s+1}^{-1/2}\right\|}_{(a)}\underbrace{\prod_{k\in I_{s+1}}\left\|H_{k}^{-1/2}H_{k-1}^{1/2}\right\|}_{(b)}\underbrace{\prod_{\tau\in I_{s}}\left\|L_{\tau}\right\|}_{(c)} (8)

Therefore showing BIBO stability comes down to bounding individual terms in (8). In particular we will show that by selecting appropriate HtH_{t} and LtL_{t}, term (a) is bounded by a constant CHC_{H} that depends on system theoretical properties of the worst-case parameter in Θ\Theta. For (b) and (c), we isolate the instances when

θ^tθ^t1Fϵ\left\|\widehat{\theta}_{t}-\widehat{\theta}_{t-1}\right\|_{F}\leq\epsilon (9)

for some chosen ϵ>0\epsilon>0. For instances where (9) holds, we use the perturbation analysis of the Lyapunov equation involving the matrix A^t+B^tK^t1\widehat{A}_{t}+\widehat{B}_{t}\widehat{K}_{t-1} (Lemma 6 for (b) and Lemma 4 for (c)) to bound (b) and (c) in terms of the partial-path movement of the selected parameters Δ^[s,e]:=τ=s+1est(ωτ+1)st(ωτ)F\widehat{\Delta}_{[s,e]}:=\sum_{\tau=s+1}^{e}\left\|\textsf{st}(\omega_{\tau+1})-\textsf{st}(\omega_{\tau})\right\|_{F}. Specifically, Lemma 6 implies

Ht1/2Ht11/2\displaystyle\left\|H_{t}^{-1/2}H_{t-1}^{1/2}\right\| {eβθ^tθ^t1F2,if (9) holdsH¯otherwise,\displaystyle\leq\begin{cases}e^{\frac{\beta\left\|\widehat{\theta}_{t}-\widehat{\theta}_{t-1}\right\|_{F}}{2}},&\text{if \eqref{eq:small-movement} holds}\\ \bar{H}&\mbox{otherwise},\end{cases} (10)

where β,H¯>1\beta,\,\bar{H}>1 are constants. We also show that from Lemma 4,

Lt\displaystyle\left\|L_{t}\right\| {ρLif (9) holdsL¯otherwise,\displaystyle\leq\begin{cases}\rho_{L}&\text{if \eqref{eq:small-movement} holds}\\ \bar{L}&\mbox{otherwise},\end{cases} (11)

for ρL(0,1)\rho_{L}\in(0,1) and L¯>1\bar{L}>1 a constant.

We now plug (10) and (11) into (8). Denote by n[s,t]n_{[s,t]} the number of pairs (τ,τ1)(\tau,\tau-1) with s+1τt1s+1\leq\tau\leq t-1 where (9) fails to hold. Let Δ[s,e]:=τ=s+1eθτθτ1F{\Delta}_{[s,e]}:=\sum_{\tau=s+1}^{e}\left\|\theta_{\tau}-\theta_{\tau-1}\right\|_{F} be the true model partial-path variation. Then (8) can be bounded as

τ[t1:s+1](A^τ+B^τK^τ1)\displaystyle\left\|\prod_{\tau\in[t-1:s+1]}\left(\widehat{A}_{\tau}+\widehat{B}_{\tau}\widehat{K}_{\tau-1}\right)\right\|
CHH¯n[s,t]eβΔ^[s+1,t1]2L¯n[s,t]ρLtsn^[s,t]1\displaystyle\leq C_{H}\cdot\bar{H}^{n_{[s,t]}}\cdot e^{\frac{\beta\widehat{\Delta}_{[s+1,t-1]}}{2}}\cdot\bar{L}^{n_{[s,t]}}\cdot\rho_{L}^{t-s-\widehat{n}_{[s,t]}-1}
CH(L¯H¯ρL)Δ^[s,t1]ϵeβΔ^[s+1,t1]2ρLts1\displaystyle\leq C_{H}\left(\frac{\bar{L}\bar{H}}{\rho_{L}}\right)^{\frac{\widehat{\Delta}_{[s,t-1]}}{\epsilon_{*}}}e^{\frac{\beta\widehat{\Delta}_{[s+1,t-1]}}{2}}\cdot\rho_{L}^{t-s-1}
CH(L¯H¯ρL)n¯(𝖽𝗂𝖺(Θ)+2κ+Δ[s,t1])ϵeβn¯(𝖽𝗂𝖺(Θ)+2κ+Δ[s+1,t1])2ρLts1\displaystyle\leq C_{H}\left(\frac{\bar{L}\bar{H}}{\rho_{L}}\right)^{\frac{\bar{n}\left(\mathsf{dia}(\Theta)+2\kappa+{\Delta}_{[s,t-1]}\right)}{\epsilon_{*}}}e^{\frac{\beta\bar{n}\left(\mathsf{dia}(\Theta)+2\kappa+{\Delta}_{[s+1,t-1]}\right)}{2}}\cdot\rho_{L}^{t-s-1}
=:cc2Δ[s,t1]ρLts,\displaystyle=:c\cdot c_{2}^{\Delta_{[s,t-1]}}\rho_{L}^{t-s},

for constants c,c2c,\,c_{2} and n¯:=n(n+m)\bar{n}:=n(n+m) for the dimension of the parameter space for At,BtA_{t},\,B_{t}. In the second inequality, we used the observation that n[s,t]Δ^[s,t1]ϵn_{[s,t]}\leq\frac{\widehat{\Delta}_{[s,t-1]}}{\epsilon} and in the last inequality we used Lemma 1. Combined with (6) and 1, this proves the desired bound. \blacksquare

An immediate consequence of Theorem 1 is that when the model variation in (1) is bounded or sublinear, Algorithm 1 guarantees BIBO stability. This is summarized below.

Corollary 1 (Bounded variation)

Suppose (1) has model variation Δ[0,t]M\Delta_{[0,t]}\leq M for a constant MM. Then,

suptxt\displaystyle\sup_{t}\left\|x_{t}\right\| c1c2M1ρL.\displaystyle\leq\frac{c_{1}\cdot c_{2}^{M}}{1-\rho_{L}}.
Corollary 2 (Unbounded but sublinear variation)

Let α(0,1)\alpha\in(0,1) and t+t\in\mathbb{N}_{+}. Suppose (1) is such that for each ktk\leq t, Δ[k,k+1]δt:=1/t(1α)\Delta_{[k,k+1]}\leq\delta_{t}:=1/{t^{(1-\alpha)}}, implying a total model variation Δ[0,t]=𝒪(tα)\Delta_{[0,t]}=\mathcal{O}(t^{\alpha}). Then for large enough tt, ρLc2δt1+ρL2\rho_{L}c_{2}^{\delta_{t}}\leq\frac{1+\rho_{L}}{2}, and therefore

xkc1i=0k(ρLc2δt)i2c11ρL.\displaystyle\left\|x_{k}\right\|\leq c_{1}\sum_{i=0}^{k}\left(\rho_{L}c_{2}^{\delta_{t}}\right)^{i}\leq\frac{2c_{1}}{1-\rho_{L}}.

Corollary 1 can be useful for scenarios where the mode of operation of the system changes infrequently and for systems such that θ(t)θ\theta(t)\rightarrow\theta^{\star} as tt\rightarrow\infty [44]. As an example, consider power systems where a prescribed set of lines can potentially become disconnected from the grid and thus change the grid topology. Corollary 2 applies to slowly drifting systems [45].

III-B Efficient implementation of CBC

In general, implementation of the functional Steiner point of the work function may be computationally inefficient. However, by taking advantage of the LTV structure, we are able to design an efficient implementation in our setting. The key observation here is that for each tt, 𝒫t\mathcal{P}_{t} (Algorithm 1, line 1) can be described by the intersection of half-spaces because the ambient parameter space Θ\Theta is assumed to be a polytope and the observed online transition data from xt,utx_{t},\,u_{t} to xt+1x_{t+1} specifies two half-space constraints at each time step due to linearity of (1). Our approach to approximate the functional Steiner point for chasing the consistent model sets is inspired by [34] where second-order cone programs (SOCPs) are used to approximate the (nested set) Steiner point of the sublevel set of the work functions for chasing half-spaces.

Denote {(ai,bi)}i=1pt\{(a_{i},b_{i})\}_{i=1}^{p_{t}} as the collection of ptp_{t} half-space constraints describing 𝒫t\mathcal{P}_{t}, i.e., aiθbia_{i}^{\top}\theta\leq b_{i}. To approximate the integral for the functional Steiner point (2) of ωt\omega_{t}, we sample NN number of random directions v𝕊n1v\in\mathbb{S}^{n-1}, evaluate the Fenchel conjugate of the work function ωt\omega^{*}_{t} at each vv with an SOCP, and take the empirical average. Finally we project the estimated functional Steiner point back to the set of consistent model 𝒫tΘ\mathcal{P}_{t}\cap\Theta. Even though the analytical functional Steiner point (2) is guaranteed to be a member of the consistent model set, the projection step is necessary because we are integrating numerically, which may result in an approximation that ends up outside of the set. We summarize this procedure in Algorithm 2. Specifically, given a direction v𝕊n1v\in\mathbb{S}^{n-1}, the Fenchel conjugate of the work function at time step tt is

ωt(v)\displaystyle{\omega^{*}_{t}}(v) =infxnωt(x)x,v\displaystyle=\inf_{x\in\mathbb{R}^{n}}\omega_{t}(x)-\left\langle x,v\right\rangle
=minxnqs𝒦ss=1tqsqs1+xqtx,v.\displaystyle=\min_{\begin{subarray}{c}x\in\mathbb{R}^{n}\\ q_{s}\in\mathcal{K}_{s}\end{subarray}}\sum_{s=1}^{t}\left\|q_{s}-q_{s-1}\right\|+\left\|x-q_{t}\right\|-\left\langle x,v\right\rangle.

This can be equivalently expressed as the following SOCP with decision variables x,q1,,qt,λ,λ1,,λtx,q_{1},\ldots,q_{t},\lambda,\lambda_{1},\ldots,\lambda_{t}:

minx,q1,,qtλ,λ1,,λt\displaystyle\min_{\begin{subarray}{c}x,q_{1},\ldots,q_{t}\\ \lambda,\lambda_{1},\ldots,\lambda_{t}\end{subarray}}\quad λ+s=1tλsv,x\displaystyle\,\lambda+\sum_{s=1}^{t}\lambda_{s}-\left\langle v,x\right\rangle (12)
   s.t. qsqs1λs,for s[t]\displaystyle\left\|q_{s}-q_{s-1}\right\|\leq\ \lambda_{s},\quad\text{for }s\in[t]
xqtλ\displaystyle\left\|x-q_{t}\right\|\leq\ \lambda
aiqsbi,for i[ps],s[t]\displaystyle a_{i}^{\top}q_{s}\leq b_{i},\quad\text{for }i\in[p_{s}],\,s\in[t]

Another potential implementation challenge is that the number of constraints in the SOCP (12) grows linearly with time due to the construction of the work function (4). This is a common drawback of online control methods based on CBC and NCBC techniques and can be overcome through truncation or over-approximation in of the work functions in practice. Additionally, if the LTV system is periodic with a known period, then we can leverage Algorithm 1 during the initial data collection phase. Once representative (persistently exciting) data is available, one could employ methods like [3] to generate a stabilizing controller for the unknown LTV system. In Section IV, we show that data collection via Algorithm 1 results in a significantly smaller state norm than random noise injection when the system is unstable.

IV Simulation

In this section, we demonstrate Algorithm 1 in two LTV systems. Both of the systems we consider are open-loop unstable, thus the algorithms must work to stabilize them. We use the same algorithm parameters for both, with Θ=[2, 3]2\Theta=[-2,\,3]^{2}, LQR cost matrices Q=IQ=I and R=1R=1.

IV-A Example 1: Markov linear jump system

We consider the following Markov linear jump system (MLJS) model from [46], with

A1=[1.5100.5],A2=[0.600.11.2],B1=[01],\displaystyle A_{1}=\left[\begin{array}[]{cc}1.5&1\\ 0&0.5\end{array}\right],\quad A_{2}=\left[\begin{array}[]{cc}0.6&0\\ 0.1&1.2\end{array}\right],\quad B_{1}=\left[\begin{array}[]{l}0\\ 1\end{array}\right],
B2=[11],Π=[0.80.20.10.9]\displaystyle B_{2}=\left[\begin{array}[]{l}1\\ 1\end{array}\right],\quad\Pi=\left[\begin{array}[]{ll}0.8&0.2\\ 0.1&0.9\end{array}\right]

where Π\Pi is the transition probability matrix from θ1\theta_{1} to θ2\theta_{2} and vice versa. We inject uniformly random disturbances such that wt{10𝟙,3𝟙, 3𝟙}w_{t}\in\{-10\mathds{1},\,-3\mathds{1},\,3\mathds{1}\} where 𝟙\mathds{1} is the all-one vector. We set the disturbances to be zero for the last 10 time steps to make explicit the stability of the closed loop. We implement certainty-equivalent control based on online least squares (OLS) with different sliding window sizes L=5, 10, 20L=5,\,10,\,20 and a exponential forgetting factor of 0.950.95 [47] as the baselines.

We show two different MLJS models generated from 2 random seeds and show the results in Figure 1. For both systems, the open loop is unstable. In Figure 1(a) the OLS-based algorithms fail to stabilize the system for window size of L=20L=20, while stabilizing the system but incurring larger state norm than the proposed algorithm for L=5, 10L=5,\,10. On the other hand, in Figure 1(b), OLS with L=5L=5 results in unstable closed loop. This example highlights the challenge of OLS-based methods, where the choice of window size is crucial for the performance. Since the underlying LTV system is unknown and our goal is to control the system online, it is unclear how to select appropriate window size to guarantee stability for OLS-based methods a priori. In contrast, Algorithm 1 does not require any parameter tuning.

We note that while advanced least-squares based identification techniques that incorporate sliding window with variable length exist, e.g. [4, 47], due to the unknown system parameters, it is unclear how to choose the various algorithm parameters such as thresholds for system change detection. Therefore, we only compare Algorithm 1 against fixed-length sliding window OLS methods as baselines.

Refer to caption
(a) closed loop of the system generated with seed # 1
Refer to caption
(b) closed loop of the system generated with seed # 2
Figure 1: Markov linear jump system for two different random seeds. For each seed: Top plot shows the state norm trajectories of the proposed algorithm, certainty-equivalent control based on online least squares (OLS) with different sliding window sizes, and the open loop. Middle plot shows the norm of the selected hypothesis model via Algorithm 2. Bottom plot shows the true model switches.

IV-B Example 2: LTV system

Our second example highlights that Algorithm 1 is a useful data-collection alternative to open-loop random noise injection. We consider the LTV system from [3, 28], with

A(k)=[1.50.0025k0.1cos(0.3k)1+0.053/2sin(0.5k)k],\displaystyle A(k)=\left[\begin{array}[]{cc}1.5&0.0025k\\ -0.1\cos(0.3k)&1+0.05^{3/2}\sin(0.5k)\sqrt{k}\end{array}\right],
B(k)=0.05[10.1k+20.1k+3].\displaystyle B(k)=0.05\left[\begin{array}[]{c}1\\ \frac{0.1k+2}{0.1k+3}\end{array}\right].

where we modified A(1,1)A(1,1) from 1 to 1.5 to increase the instability of the open loop in the beginning; thus making it more challenging to stabilize. We consider no disturbances here, which is a common setting in direct data-driven control, e.g., [3, 26, 27]. In particular, we compare the proposed algorithm against randomly generated bounded inputs from UNIF[1,1]\textsf{UNIF}[-1,1]. We also modify the control inputs from Algorithm 1 to be ut=K^t1xt+ηt𝟙u_{t}=\widehat{K}_{t-1}x_{t}+\eta_{t}\cdot\mathds{1} with ηtUNIF[1,1]\eta_{t}\sim\textsf{UNIF}[-1,1] so that we can collect rich data in the closed loop. This is motivated by the growing body of data-driven control methods such as [3, 27, 28] that leverage sufficiently rich offline data to perform control design for unknown LTV systems. However, most of these works directly inject random inputs for data collection. It is evident in Figure 2 that when the open-loop system is unstable it may be undesirable to run the system without any feedback control. Therefore, Algorithm 1 complements existing data-driven methods by allowing safe data collection with significantly better transient behavior.

Refer to caption
Figure 2: Simulation result for the LTV system in example 2. Here we plot the the state and control norm, as well as the selected hypothesis model via CBC θ^t\widehat{\theta}_{t} and true models θt\theta_{t}.

V Concluding remarks

In this paper, we propose a model-based approach for stabilizing an unknown LTV system under arbitrary non-stochastic disturbances in the sense of bounded input bounded output under the assumption of infrequently changing or slowly drifting dynamics. Our approach uses ideas from convex body chasing (CBC), which is an online problem where an agent must choose a sequence of points within sequentially presented convex sets with the aim of minimizing the sum of distances between the chosen points. The algorithm requires minimal tuning and achieves significantly better performance than the naive online least squares based control. Future work includes sharpening the stability analysis to go beyond the BIBO guarantee in this work, which will require controlling the difference between the estimated disturbances and true disturbances. Another direction is to extend the current results to the networked case, similar to [24].

References

  • [1] D. Deka, S. Backhaus, and M. Chertkov, “Structure learning in power distribution networks,” IEEE Transactions on Control of Network Systems, vol. 5, no. 3, pp. 1061–1074, Sep. 2018.
  • [2] P. Gradu, E. Hazan, and E. Minasyan, “Adaptive regret for control of time-varying dynamics,” arXiv preprint arXiv:2007.04393, 2020.
  • [3] B. Nortmann and T. Mylvaganam, “Data-driven control of linear time-varying systems,” in 2020 59th IEEE Conference on Decision and Control (CDC).   IEEE, 2020, pp. 3939–3944.
  • [4] Y. Luo, V. Gupta, and M. Kolar, “Dynamic regret minimization for control of non-stationary linear dynamical systems,” Proceedings of the ACM on Measurement and Analysis of Computing Systems, vol. 6, no. 1, pp. 1–72, 2022.
  • [5] Y. Lin, J. Preiss, E. Anand, Y. Li, Y. Yue, and A. Wierman, “Online adaptive controller selection in time-varying systems: No-regret via contractive perturbations,” arXiv preprint arXiv:2210.12320, 2022.
  • [6] E. Minasyan, P. Gradu, M. Simchowitz, and E. Hazan, “Online control of unknown time-varying dynamical systems,” Advances in Neural Information Processing Systems, vol. 34, pp. 15 934–15 945, 2021.
  • [7] R. Tedrake, “Underactuated robotics: Learning, planning, and control for efficient and agile machines course notes for mit 6.832,” Working draft edition, vol. 3, p. 4, 2009.
  • [8] P. Falcone, F. Borrelli, H. E. Tseng, J. Asgari, and D. Hrovat, “Linear time-varying model predictive control and its application to active steering systems: Stability analysis and experimental validation,” International Journal of Robust and Nonlinear Control: IFAC-Affiliated Journal, vol. 18, no. 8, pp. 862–875, 2008.
  • [9] K. S. Tsakalis and P. A. Ioannou, Linear time-varying systems: control and adaptation.   Prentice-Hall, Inc., 1993.
  • [10] R. Tóth, Modeling and identification of linear parameter-varying systems.   Springer, 2010, vol. 403.
  • [11] J. Mohammadpour and C. W. Scherer, Control of linear parameter varying systems with applications.   Springer Science & Business Media, 2012.
  • [12] W. Zhang, Q.-L. Han, Y. Tang, and Y. Liu, “Sampled-data control for a class of linear time-varying systems,” Automatica, vol. 103, pp. 126–134, 2019.
  • [13] R. Mojgani and M. Balajewicz, “Stabilization of linear time-varying reduced-order models: A feedback controller approach,” International Journal for Numerical Methods in Engineering, vol. 121, no. 24, pp. 5490–5510, 2020.
  • [14] G. Goel and B. Hassibi, “Regret-optimal estimation and control,” IEEE Transactions on Automatic Control, 2023.
  • [15] K. Tsakalis and P. Ioannou, “Adaptive control of linear time-varying plants,” Automatica, vol. 23, no. 4, pp. 459–468, 1987.
  • [16] J.-J. Slotine and J. Coetsee, “Adaptive sliding controller synthesis for non-linear systems,” International Journal of Control, vol. 43, no. 6, pp. 1631–1651, 1986.
  • [17] R. Marino and P. Tomei, “Adaptive control of linear time-varying systems,” Automatica, vol. 39, no. 4, pp. 651–659, 2003.
  • [18] M. Verhaegen and X. Yu, “A class of subspace model identification algorithms to identify periodically and arbitrarily time-varying systems,” Automatica, vol. 31, no. 2, pp. 201–216, 1995.
  • [19] B. Bamieh and L. Giarre, “Identification of linear parameter varying models,” International Journal of Robust and Nonlinear Control: IFAC-Affiliated Journal, vol. 12, no. 9, pp. 841–853, 2002.
  • [20] T. Sarkar, A. Rakhlin, and M. Dahleh, “Nonparametric system identification of stochastic switched linear systems,” in 2019 IEEE 58th Conference on Decision and Control (CDC), 2019.
  • [21] S. Dean, S. Tu, N. Matni, and B. Recht, “Safely learning to control the constrained linear quadratic regulator,” in 2019 American Control Conference (ACC).   IEEE, 2019, pp. 5582–5588.
  • [22] S. Talebi, S. Alemzadeh, N. Rahimi, and M. Mesbahi, “On regularizability and its application to online control of unstable lti systems,” IEEE Transactions on Automatic Control, 2021.
  • [23] S. Lale, K. Azizzadenesheli, B. Hassibi, and A. Anandkumar, “Reinforcement learning with fast stabilization in linear dynamical systems,” in International Conference on Artificial Intelligence and Statistics.   PMLR, 2022, pp. 5354–5390.
  • [24] J. Yu, D. Ho, and A. Wierman, “Online adversarial stabilization of unknown networked systems,” Proceedings of the ACM on Measurement and Analysis of Computing Systems, vol. 7, no. 1, pp. 1–43, 2023.
  • [25] X. Chen and E. Hazan, “Black-box control for linear dynamical systems,” in Conference on Learning Theory.   PMLR, 2021.
  • [26] M. Rotulo, C. De Persis, and P. Tesi, “Online learning of data-driven controllers for unknown switched linear systems,” Automatica, vol. 145, p. 110519, 2022.
  • [27] S. Baros, C.-Y. Chang, G. E. Colon-Reyes, and A. Bernstein, “Online data-enabled predictive control,” Automatica, vol. 138, p. 109926, 2022.
  • [28] B. Pang, T. Bian, and Z.-P. Jiang, “Data-driven finite-horizon optimal control for linear time-varying discrete-time systems,” in 2018 IEEE Conference on Decision and Control (CDC), 2018, pp. 861–866.
  • [29] S.-J. Liu, M. Krstic, and T. Başar, “Batch-to-batch finite-horizon lq control for unknown discrete-time linear systems via stochastic extremum seeking,” IEEE Transactions on Automatic Control, vol. 62, no. 8, pp. 4116–4123, 2016.
  • [30] G. Qu, Y. Shi, S. Lale, A. Anandkumar, and A. Wierman, “Stable online control of linear time-varying systems,” in Learning for Dynamics and Control.   PMLR, 2021, pp. 742–753.
  • [31] Y. Ouyang, M. Gagrani, and R. Jain, “Learning-based control of unknown linear systems with thompson sampling,” arXiv preprint arXiv:1709.04047, 2017.
  • [32] Y. Han, R. Solozabal, J. Dong, X. Zhou, M. Takac, and B. Gu, “Learning to control under time-varying environment,” arXiv preprint arXiv:2206.02507, 2022.
  • [33] M. Sellke, “Chasing convex bodies optimally,” in Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms.   SIAM, 2020, pp. 1509–1518.
  • [34] C. Argue, A. Gupta, Z. Tang, and G. Guruganesh, “Chasing convex bodies with linear competitive ratio,” Journal of the ACM (JACM), vol. 68, no. 5, pp. 1–10, 2021.
  • [35] N. Bansa, M. Böhm, M. Eliáš, G. Koumoutsos, and S. W. Umboh, “Nested convex bodies are chaseable,” in Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms.   SIAM, 2018, pp. 1253–1260.
  • [36] S. Bubeck, B. Klartag, Y. T. Lee, Y. Li, and M. Sellke, “Chasing nested convex bodies nearly optimally,” in Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms.   SIAM, 2020.
  • [37] D. Ho, H. Le, J. Doyle, and Y. Yue, “Online robust control of nonlinear systems with large uncertainty,” in International Conference on Artificial Intelligence and Statistics.   PMLR, 2021, pp. 3475–3483.
  • [38] C. Yeh, J. Yu, Y. Shi, and A. Wierman, “Robust online voltage control with an unknown grid topology,” in Proceedings of the Thirteenth ACM International Conference on Future Energy Systems, 2022, pp. 240–250.
  • [39] B. Ramasubramanian, B. Xiao, L. Bushnell, and R. Poovendran, “Safety-critical online control with adversarial disturbances,” in 2020 59th IEEE Conference on Decision and Control (CDC).   IEEE, 2020.
  • [40] E.-W. Bai, R. Tempo, and H. Cho, “Membership set estimators: size, optimal inputs, complexity and relations with least squares,” IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, vol. 42, no. 5, pp. 266–277, 1995.
  • [41] A. Cohen, T. Koren, and Y. Mansour, “Learning linear-quadratic regulators efficiently with only sqrt(t) regret,” in International Conference on Machine Learning.   PMLR, 2019, pp. 1300–1309.
  • [42] N. Agarwal, B. Bullins, E. Hazan, S. Kakade, and K. Singh, “Online control with adversarial disturbances,” in International Conference on Machine Learning.   PMLR, 2019, pp. 111–119.
  • [43] A. Cohen, A. Hasidim, T. Koren, N. Lazic, Y. Mansour, and K. Talwar, “Online linear quadratic control,” in International Conference on Machine Learning.   PMLR, 2018, pp. 1029–1038.
  • [44] W. Hahn et al., Stability of motion.   Springer, 1967, vol. 138.
  • [45] F. Amato, G. Celentano, and F. Garofalo, “New sufficient conditions for the stability of slowly varying linear systems,” IEEE Transactions on Automatic Control, vol. 38, no. 9, pp. 1409–1411, 1993.
  • [46] J. Xiong and J. Lam, “Stabilization of discrete-time markovian jump linear systems via time-delayed controllers,” Automatica, vol. 42, no. 5, pp. 747–753, 2006.
  • [47] J. Jiang and Y. Zhang, “A revisit to block and recursive least squares for parameter estimation,” Computers & Electrical Engineering, vol. 30, no. 5, pp. 403–416, 2004.
  • [48] U. Shaked, “Guaranteed stability margins for the discrete-time linear quadratic optimal regulator,” IEEE Transactions on Automatic Control, vol. 31, no. 2, pp. 162–165, 1986.
  • [49] A. S. Householder, The theory of matrices in numerical analysis.   Courier Corporation, 2013.
  • [50] M. Simchowitz and D. Foster, “Naive exploration is optimal for online lqr,” in International Conference on Machine Learning.   PMLR, 2020.
  • [51] P. Gahinet and A. Laub, “Computable bounds for the sensitivity of the algebraic riccati equation,” SIAM journal on control and optimization, vol. 28, no. 6, pp. 1461–1480, 1990.

-A Proof of Lemma 1

We have

τ=s+1eθ^τθ^τ1F\displaystyle\sum_{\tau=s+1}^{e}\left\|\widehat{\theta}_{\tau}-\widehat{\theta}_{\tau-1}\right\|_{F} =τ=s+1est(ωτ)st(ωτ1)F\displaystyle=\sum_{\tau=s+1}^{e}\left\|\textsf{st}(\omega_{\tau})-\textsf{st}(\omega_{\tau-1})\right\|_{F}
(a)n\ThisStyle\ensurestackMath\stackinsetc.2\LMptc.5\LMpt\SavedStyle\SavedStylev(τ=s+1e|ωτ(v)ωτ1(v)|)v𝑑v\displaystyle\stackrel{{\scriptstyle(a)}}{{\leq}}n\,\ThisStyle{\ensurestackMath{\stackinset{c}{.2\LMpt}{c}{.5\LMpt}{\SavedStyle-}{\SavedStyle\phantom{\int}}}\kern-16.56937pt}\int_{v}\left(\sum_{\tau=s+1}^{e}\left|\omega^{*}_{\tau}(v)-\omega^{*}_{\tau-1}(v)\right|\right)\,v\,dv
=(b)n\ThisStyle\ensurestackMath\stackinsetc.2\LMptc.5\LMpt\SavedStyle\SavedStylev(τ=s+1eωτ(v)ωτ1(v))v𝑑v\displaystyle\stackrel{{\scriptstyle(b)}}{{=}}n\,\ThisStyle{\ensurestackMath{\stackinset{c}{.2\LMpt}{c}{.5\LMpt}{\SavedStyle-}{\SavedStyle\phantom{\int}}}\kern-16.56937pt}\int_{v}\left(\sum_{\tau=s+1}^{e}\omega^{*}_{\tau}(v)-\omega^{*}_{\tau-1}(v)\right)\,v\,dv
=n\ThisStyle\ensurestackMath\stackinsetc.2\LMptc.5\LMpt\SavedStyle\SavedStylev(ωe(v)ωs(v))v𝑑v\displaystyle=n\,\ThisStyle{\ensurestackMath{\stackinset{c}{.2\LMpt}{c}{.5\LMpt}{\SavedStyle-}{\SavedStyle\phantom{\int}}}\kern-16.56937pt}\int_{v}\left(\omega^{*}_{e}(v)-\omega^{*}_{s}(v)\right)\,v\,dv
(c)n(minxωe(x)minyωs(y)+2κ)\displaystyle\stackrel{{\scriptstyle(c)}}{{\leq}}n\cdot(\min_{x}\omega_{e}(x)-\min_{y}\omega_{s}(y)+2\kappa) (13)

where (a) is due to the definition (2). For (b), we used the observation that ωt(v)\omega^{*}_{t}(v) is non-decreasing in time. For (c), by definition of the Fenchel conjugate (3), we have that ωe(v)=infxωe(x)x,v\omega^{*}_{e}(v)=\text{inf}_{x}\omega_{e}(x)-\left\langle x,v\right\rangle. Denote (x,q1,,qe)(x^{\star},q_{1}^{\star},\ldots,q_{e}^{\star}) as the optimal solution to the problem minxωe(x)\min_{x}\omega_{e}(x). It is clear that ωe(v)ωe(x)x,vminxωe(x)+κ\omega^{*}_{e}(v)\leq\omega_{e}(x^{\star})-\left\langle x^{\star},v\right\rangle\leq\min_{x}\omega_{e}(x)+\kappa where in the last inequality we used Cauchy-Shwarz and κ:=maxθΘθF\kappa:=\max_{\theta\in\Theta}\left\|\theta\right\|_{F}. Similarly, we also have ωs(v)infyωs(y)κ\omega_{s}^{*}(v)\geq\inf_{y}\omega_{s}(y)-\kappa.

Denote OPT[0,e]\textsc{OPT}_{[0,e]} as the minimizing trajectory (OPT0,,OPTe)(\textsc{OPT}_{0},\,\ldots,\textsc{OPT}_{e}) to minxωe(x)\min_{x}\omega_{e}(x) where argminxωe(x)=OPTe\text{argmin}_{x}\omega_{e}(x)=\textsc{OPT}_{e}. This last equality is by the observation that if x:=argminxωe(x)OPTex^{\star}:=\text{argmin}_{x}\omega_{e}(x)\not=\textsc{OPT}_{e}, then ωe(OPTe)ωe(x)\omega_{e}(\textsc{OPT}_{e})\leq\omega_{e}(x^{\star}) by definition (4), thus contradicting that xx^{\star} is defined to be the minimizer of ωe\omega_{e}. We also denote INT[0,s]\textsc{INT}_{[0,s]} as the minimizing trajectory to minyωs(y)\min_{y}\omega_{s}(y). To reduce notation, we denote Δ[s,e]OPT:=τ=s+1eOPTτOPTτ1F\Delta^{\textsc{OPT}}_{[s,e]}:=\sum_{\tau=s+1}^{e}\left\|\textsc{OPT}_{\tau}-\textsc{OPT}_{\tau-1}\right\|_{F} and Δ[s,e]INT:=τ=s+1eINTτINTτ1F\Delta^{\textsc{INT}}_{[s,e]}:=\sum_{\tau=s+1}^{e}\left\|\textsc{INT}_{\tau}-\textsc{INT}_{\tau-1}\right\|_{F}. Then we have

(13)\displaystyle\eqref{eq:intermediate-step} =n(Δ[0,e]OPTΔ[0,s]INT+2κ)\displaystyle=n\cdot\left(\Delta^{\textsc{OPT}}_{[0,e]}-\Delta^{\textsc{INT}}_{[0,s]}+2\kappa\right)
(c)n(Δ[0,e]OPTΔ[0,s]OPT+dia(Θ)+2κ)\displaystyle\stackrel{{\scriptstyle(c)}}{{\leq}}n\cdot\left(\Delta^{\textsc{OPT}}_{[0,e]}-\Delta^{\textsc{OPT}}_{[0,s]}+\textsf{dia}(\Theta)+2\kappa\right)
=n(Δ[s,e]OPT+dia(Θ)+2κ).\displaystyle=n\cdot\left(\Delta^{\textsc{OPT}}_{[s,e]}+\textsf{dia}(\Theta)+2\kappa\right).

where (c) holds because if τ=1sOPTτOPTτ1F>τ=1sINTτINTτ1F+dia(Θ)\sum_{\tau=1}^{s}\left\|\textsc{OPT}_{\tau}-\textsc{OPT}_{\tau-1}\right\|_{F}>\sum_{\tau=1}^{s}\left\|\textsc{INT}_{\tau}-\textsc{INT}_{\tau-1}\right\|_{F}+\textsf{dia}(\Theta) and OPT[0,s]INT[0,s]\textsc{OPT}_{[0,s]}\not=\textsc{INT}_{[0,s]}, then we can replace the [0,s][0,s] portion of the optimal trajectory OPT[0,e]\textsc{OPT}_{[0,e]} with INT[0,s]\textsc{INT}_{[0,s]} and achieve a lower cost for ωe(OPTe)\omega_{e}(\textsc{OPT}_{e}), thus contradicting the optimality of OPT[0,e]\textsc{OPT}_{[0,e]}. To see why the fictitious trajectory (INT[0,s],OPT[s+1,e])\left(\textsc{INT}_{[0,s]},\textsc{OPT}_{[s+1,e]}\right) achieves lower cost than OPT[0,e]\textsc{OPT}_{[0,e]}, we compare the total movement cost during the interval [0,s+1][0,s+1],

τ=1sINTτINTτ1F+OPTs+1INTsF\displaystyle\sum_{\tau=1}^{s}\left\|\textsc{INT}_{\tau}-\textsc{INT}_{\tau-1}\right\|_{F}+\left\|\textsc{OPT}_{s+1}-\textsc{INT}_{s}\right\|_{F}
τ=1sINTτINTτ1F+OPTs+1OPTsF\displaystyle\leq\sum_{\tau=1}^{s}\left\|\textsc{INT}_{\tau}-\textsc{INT}_{\tau-1}\right\|_{F}+\left\|\textsc{OPT}_{s+1}-\textsc{OPT}_{s}\right\|_{F}
+OPTsINTsF\displaystyle\qquad+\left\|\textsc{OPT}_{s}-\textsc{INT}_{s}\right\|_{F}
τ=1sINTτINTτ1F+OPTs+1OPTsF+dia(Θ)\displaystyle\leq\sum_{\tau=1}^{s}\left\|\textsc{INT}_{\tau}-\textsc{INT}_{\tau-1}\right\|_{F}+\left\|\textsc{OPT}_{s+1}-\textsc{OPT}_{s}\right\|_{F}+\textsf{dia}(\Theta)
<τ=1sOPTτOPTτ1F+OPTs+1OPTsF,\displaystyle<\sum_{\tau=1}^{s}\left\|\textsc{OPT}_{\tau}-\textsc{OPT}_{\tau-1}\right\|_{F}+\left\|\textsc{OPT}_{s+1}-\textsc{OPT}_{s}\right\|_{F},

which means the fictitious trajectory achieves lower overall cost. Therefore (c) must hold. \blacksquare

-B Auxiliary results

Here we summarize the helper lemmas used in the proof sketch of Theorem 1. First, we define some useful notation.

Lyapunov equation. Let X,Yn×nX,Y\in\mathbb{R}^{n\times n} with Y=Y0Y=Y^{\top}\succ 0 and ρ(X)<1\rho(X)<1. Define dlyap(X,Y)\textsf{dlyap}(X,Y) to be the unique positive definite solution ZZ to the Lyapunov equation XZXZ=YX^{\top}ZX-Z=Y. For a stabilizable system (A,B)(A,B) with optimal infinite-horizon LQR feedback K:=K([AB])K:=K^{*}([A\ B]) with cost matrices Q,R=IQ,R=I, we define

P(A,B)=dlyap(A+BK([AB]),In+K([AB])K([AB]))P(A,B)=\textsf{dlyap}(A+BK^{*}([A\ B]),\,I_{n}+K^{*}([A\ B])^{\top}K^{*}([A\ B]))

and

H(A,B)=dlyap(A+BK([AB]),In).H(A,B)=\textsf{dlyap}(A+BK^{*}([A\ B]),\,I_{n}).

We also define the shorthand for the following:

Pt:=P(A^t,B^t),Ht:=H(A^t,B^t).{P}_{t}:=P(\widehat{A}_{t},\widehat{B}_{t}),\quad{H}_{t}:=H(\widehat{A}_{t},\widehat{B}_{t}). (14)

Constants. Throughout the proof, we will reference the following system-theoretical constants for the parameter set Θ\Theta defined in 2:

K:=sup[AB]ΘK([AB]),γ:=max[AB]ΘA+BK([AB]).\displaystyle\left\|K_{*}\right\|:=\sup_{[A\ B]\in\Theta}\left\|K^{*}([A\ B])\right\|,\gamma_{*}:=\max_{[A\ B]\in\Theta}\left\|A+BK^{*}([A\ B])\right\|.

We also quantify the stability of every model in Θ\Theta under its corresponding optimal LQR gain. Let

C>0,r(0,1)C_{*}>0,\quad r_{*}\in(0,1)

be such that for all θ:=[AB]Θ\theta:=[A\ B]\in\Theta, K:=K(θ)K:=K^{*}(\theta), and i+i\in\mathbb{N}_{+}, ((A+BK)T)i(A+BK)iCr2i\left\|\left(\left(A+BK\right)^{T}\right)^{i}\right\|\cdot\left\|\left(A+BK\right)^{i}\right\|\leq C_{*}r_{*}^{2i}. By Lemma 2 which is stated below and 2, such CC_{*} and rr_{*} always exist. Further, we define

P\displaystyle\left\|P_{*}\right\| :=sup[AB]ΘP(A,B),H:=sup[AB]ΘH(A,B),\displaystyle:=\sup_{[A\ B]\in\Theta}\left\|P(A,B)\right\|,\quad\left\|H_{*}\right\|:=\sup_{[A\ B]\in\Theta}\left\|H(A,B)\right\|,
ϵ\displaystyle\epsilon_{*} :=1/(54P5),c:=max[AB]ΘλmaxH(A,B)λminH(A,B),\displaystyle:=1/\left(54\left\|P_{*}\right\|^{5}\right),\quad c_{*}:=\max_{[A\ B]\in\Theta}\frac{\lambda_{\max}H(A,B)}{\lambda_{\min}H(A,B)},
h\displaystyle h_{*} :=sup[A1B1],[A2B2]ΘH(A1,B1)1/2H(A2,B2)1/2.\displaystyle:=\sup_{[A_{1}\ B_{1}],\,[A_{2}\ B_{2}]\in\Theta}\left\|H(A_{1},B_{1})^{1/2}\right\|\left\|H(A_{2},B_{2})^{-1/2}\right\|.

To justify the existence of these constants, note that discrete-time optimal LQR controller has guaranteed stability margin [48] and that by Lemma 2 and the fact that the solution to Lyapunov equation has the following closed form,

P(A,B)=i=0((A+BK))i(I+KK)(A+BK)i,P(A,B)=\sum_{i=0}^{\infty}\left((A+BK)^{\top}\right)^{i}(I+K^{\top}K)(A+BK)^{i}, (15)

we have that for all [A,B]Θ[A,B]\in\Theta,

P(A,B)\displaystyle\left\|P(A,B)\right\| (1+K2)(1+i=1((A+BK))i(A+BK)i)\displaystyle\leq\left(1+\left\|K\right\|^{2}\right)\left(1+\sum_{i=1}^{\infty}\left\|\left(\left(A+BK\right)^{\top}\right)^{i}\right\|\left\|\left(A+BK\right)^{i}\right\|\right)
(1+K2)(1r2+C)1r2=:P.\displaystyle\leq\frac{\left(1+\left\|K_{*}\right\|^{2}\right)\left(1-r_{*}^{2}+C_{*}\right)}{1-r_{*}^{2}}=:\left\|P_{*}\right\|.

We can similarly derive H\left\|H_{*}\right\|. By definition of the Lyapunov solution (15), PH1\left\|P_{*}\right\|\geq\left\|H_{*}\right\|\geq 1.

Lemma 2 ([49, page 183])

For a matrix An×nA\in\mathbb{R}^{n\times n}, with ρ:=ρ(A)\rho:=\rho(A), there exist constants κ1,κ2\kappa_{1},\kappa_{2} such that for any positive integer ii

κ1ρiin11Aiκ2ρiin11\kappa_{1}\rho^{i}i^{n_{1}-1}\leq\left\|A^{i}\right\|\leq\kappa_{2}\rho^{i}i^{n_{1}-1}

where n1n_{1} is the size of the largest Jordan block corresponding to eigenvalue of ρ\rho in Jordan block form representation of AA.

Lemma 3 ([50, Proposition 6])

Let Θ=[AB]\Theta=[A\ B] be a stabilizable system, with optimal controller K:=K(θ)K:=K^{*}(\theta) and P:=P(A,B)P:=P(A,B). Let θ^=[A^B^]\widehat{\theta}=[\widehat{A}\ \widehat{B}] be an estimate of θ\theta, K^:=K(θ^)\widehat{K}:=K^{*}(\widehat{\theta}) the optimal controller for the estimate, and ϵ:=max{AA^,BB^}\epsilon:=\max\left\{\left\|A-\widehat{A}\right\|,\left\|B-\widehat{B}\right\|\right\}. Then if α:=8P2ϵ<1\alpha:=8\left\|P\right\|^{2}\epsilon<1:

B(K^K)8(1α)7/4P7/2ϵ.\left\|B\left(\widehat{K}-K\right)\right\|\leq 8(1-\alpha)^{-7/4}\left\|P\right\|^{7/2}\epsilon.
Lemma 4 ([50, Theorem 8])

Let θ=[AB]\theta=[A\ B] be a stabilizable system, with P:=P(A,B)P:=P(A,B), and H=H(A,B)H=H(A,B). Let θ^=[A^B^]\widehat{\theta}=[\widehat{A}\ \widehat{B}] be an estimate of θ\theta satisfying max{AA^,BB^}ϵ\max\left\{\left\|A-\widehat{A}\right\|,\left\|B-\widehat{B}\right\|\right\}\leq\epsilon. Consider certainty equivalent controller K^=K(θ^)\widehat{K}=K^{*}(\widehat{\theta}). Then if ϵ\epsilon is such that 54P5ϵ154\left\|P\right\|^{5}\epsilon\leq 1, we have

(A+BK^)H(A+BK^)\displaystyle(A+B\widehat{K})^{\top}H(A+B\widehat{K}) (112H1)H(112P1)H.\displaystyle\preceq\left(1-\frac{1}{2}\left\|H\right\|^{-1}\right)H\preceq\left(1-\frac{1}{2}\left\|P\right\|^{-1}\right)H.
Lemma 5 ([51])

Let XX be the solution to the Lyapunov equation XFXF=MX-F^{\top}XF=M, and let X+ΔXX+\Delta X be the solution to the perturbed problem

Z(F+ΔF)Z(F+ΔF)=M.Z-(F+\Delta F)^{\top}Z(F+\Delta F)=M.

The following inequality holds for the spectral norm:

ΔXX+ΔX2k=0+(F)kFk(2F+ΔF)ΔF.\frac{\|\Delta X\|}{\|X+\Delta X\|}\leq 2\left\|\sum_{k=0}^{+\infty}\left(F^{\top}\right)^{k}F^{k}\right\|\cdot(2\|F\|+\|\Delta F\|)\cdot{\|\Delta F\|}.
Lemma 6

Suppose ϵt+1:=max{A^t+1A^t,B^t+1B^t}\epsilon_{t+1}:=\max\left\{\left\|\widehat{A}_{t+1}-\widehat{A}_{t}\right\|,\left\|\widehat{B}_{t+1}-\widehat{B}_{t}\right\|\right\} and α:=8P2ϵt+11/2\alpha:=8\left\|P_{*}\right\|^{2}\epsilon_{t+1}\leq 1/2, . Then HtH_{t} defined in (14) satisfies

HtHt+1(1+ηt+1)H_{t}\preceq H_{t+1}(1+\eta_{t+1})

for ηt+1:=cβϵt+1\eta_{t+1}:=c_{*}\beta_{*}\epsilon_{t+1}, and

β:=2C1r2(2γ+3+K)(1+32P2+K).\beta_{*}:=\frac{2C_{*}}{1-r_{*}^{2}}\left(2\gamma_{*}+3+\left\|K_{*}\right\|\right)\left(1+32\left\|P_{*}\right\|^{2}+\left\|K_{*}\right\|\right).

Proof of Lemma 6. For notational brevity, we drop the time index for ϵ\epsilon and η\eta in the proof. Applying Lemma 5 with X=HtX=H_{t}, X+ΔX=Ht+1X+\Delta X=H_{t+1} and F=A^t+B^tK^tF=\widehat{A}_{t}+\widehat{B}_{t}\widehat{K}_{t} and ΔF=(A^t+1A^t)+(B^t+1K^t+1B^tK^t)\Delta F=(\widehat{A}_{t+1}-\widehat{A}_{t})+(\widehat{B}_{t+1}\widehat{K}_{t+1}-\widehat{B}_{t}\widehat{K}_{t}), and M=InM=I_{n} we have

Ht+1HtHt+12k=0+((A^t+B^tK^t))k(A^t+B^tK^t)k\displaystyle\frac{\left\|H_{t+1}-H_{t}\right\|}{\left\|H_{t+1}\right\|}\leq 2\left\|\sum_{k=0}^{+\infty}\left((\widehat{A}_{t}+\widehat{B}_{t}\widehat{K}_{t})^{\top}\right)^{k}(\widehat{A}_{t}+\widehat{B}_{t}\widehat{K}_{t})^{k}\right\|
(2A^t+B^tK^t+A^t+1A^t)+\displaystyle\quad\cdot\Big{(}2\left\|\widehat{A}_{t}+\widehat{B}_{t}\widehat{K}_{t}\right\|+\left\|\widehat{A}_{t+1}-\widehat{A}_{t})\right\|+
B^t+1(K^t+1K^t)+(B^t+1B^t)K^t)\displaystyle\quad\quad\quad\quad\quad\left\|\widehat{B}_{t+1}(\widehat{K}_{t+1}-\widehat{K}_{t})\right\|+\left\|(\widehat{B}_{t+1}-\widehat{B}_{t})\widehat{K}_{t}\right\|\Big{)}
(A^t+1A^t)+B^t+1(K^t+1K^t)+(B^t+1B^t)K^t)\displaystyle\quad\cdot\Big{(}\left\|\widehat{A}_{t+1}-\widehat{A}_{t})\right\|+\left\|\widehat{B}_{t+1}(\widehat{K}_{t+1}-\widehat{K}_{t})\right\|+\left\|(\widehat{B}_{t+1}-\widehat{B}_{t})\widehat{K}_{t}\right\|\Big{)}
ϵ2C1r2(2γ+ϵ(1+32P2+K))\displaystyle\quad\quad\leq\epsilon\frac{2C_{*}}{1-r_{*}^{2}}\left(2\gamma_{*}+\epsilon\left(1+32\left\|P_{*}\right\|^{2}+\left\|K_{*}\right\|\right)\right)\cdot
(1+32P2+K)\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad\left(1+32\left\|P_{*}\right\|^{2}+\left\|K_{*}\right\|\right)
ϵ2C1r2(2γ+3+K)(1+32P2+K)\displaystyle\quad\quad\quad\quad\quad\leq\epsilon\frac{2C_{*}}{1-r_{*}^{2}}\left(2\gamma_{*}+3+\left\|K_{*}\right\|\right)\left(1+32\left\|P_{*}\right\|^{2}+\left\|K_{*}\right\|\right)
=:ϵβ,\displaystyle\quad\quad\quad\quad\quad=:\epsilon\beta,

where in the second inequality we used Lemma 3 to bound B^t+1(K^t+1K^t)32Pt+17/2ϵ\left\|\widehat{B}_{t+1}(\widehat{K}_{t+1}-\widehat{K}_{t})\right\|\leq 32\left\|P_{t+1}\right\|^{7/2}\epsilon and in the last inequality we use the assumption 8ϵP21/28\epsilon\left\|P_{*}\right\|^{2}\leq 1/2.

To show HtHt+1(1+η)H_{t}\preceq H_{t+1}(1+\eta) for some η\eta, it suffices to show that for all vectors vRRnv\in RR^{n}, v(HtHt+1)vηvHt+1vv^{\top}(H_{t}-H_{t+1})v\leq\eta v^{\top}H_{t+1}v. With the preceding calculation, we have

v(HtHt+1)v\displaystyle v^{\top}(H_{t}-H_{t+1})v v2HtHt+1\displaystyle\leq\|v\|^{2}\left\|H_{t}-H_{t+1}\right\|
ϵβv2Ht+1\displaystyle\leq\epsilon\beta_{*}\|v\|^{2}\left\|H_{t+1}\right\|
ϵβcλmin(Ht+1)v2\displaystyle\leq\epsilon\beta_{*}c_{*}\lambda_{\min}(H_{t+1})\|v\|^{2}
ϵβcvHt+1v\displaystyle\leq\epsilon\beta_{*}c_{*}v^{\top}H_{t+1}v

This proves the desired bound, with η=cβϵ\eta=c_{*}\beta_{*}\epsilon and

β=2C1r2(2γ+3+K)(1+32P2+K).\beta_{*}=\frac{2C_{*}}{1-r_{*}^{2}}\left(2\gamma_{*}+3+\left\|K_{*}\right\|\right)\left(1+32\left\|P_{*}\right\|^{2}+\left\|K_{*}\right\|\right).

\hfill\blacksquare

-C Proof of Theorem 1

Recall that the closed loop dynamics can be characterized as (6). Therefore,

xtW+Ws=0t2τ[t1:s+1](A^τ+B^τK^τ1).\displaystyle\left\|x_{t}\right\|\leq W+W\sum_{s=0}^{t-2}\left\|\prod_{\tau\in[t-1:s+1]}\left(\widehat{A}_{\tau}+\widehat{B}_{\tau}\widehat{K}_{\tau-1}\right)\right\|. (16)

Define

Lt\displaystyle L_{t} :=Ht1/2(A^t+B^tK^t1)Ht1/2,\displaystyle:=H_{t}^{-1/2}(\widehat{A}_{t}+\widehat{B}_{t}\widehat{K}_{t-1})H_{t}^{1/2},

where HtH_{t} is defined in (14). This gives,

A^t+B^tK^t1\displaystyle\widehat{A}_{t}+\widehat{B}_{t}\widehat{K}_{t-1} :=Ht1/2LtHt1/2.\displaystyle:=H_{t}^{1/2}L_{t}H_{t}^{-1/2}.

Therefore, each summand in (16) can be bounded as

τIs(A^τ+B^τK^τ1)\displaystyle\left\|\prod_{\tau\in I_{s}}\left(\widehat{A}_{\tau}+\widehat{B}_{\tau}\widehat{K}_{\tau-1}\right)\right\|
Ht11/2Hs+11/2(a)kIs+1Hk1/2Hk11/2(b)τIsLτ(c)\displaystyle\leq\underbrace{\left\|H_{t-1}^{1/2}\right\|\left\|H_{s+1}^{-1/2}\right\|}_{(a)}\underbrace{\prod_{k\in I_{s+1}}\left\|H_{k}^{-1/2}H_{k-1}^{1/2}\right\|}_{(b)}\underbrace{\prod_{\tau\in I_{s}}\left\|L_{\tau}\right\|}_{(c)} (17)

where we used IsI_{s} as shorthand for the interval [t1:s+1][t-1:s+1].

Bounding (a). We directly use the system-theoretical constant introduced in Section -B so that (a) h\leq h_{*}.

Bounding (b). Lemma 6 directly implies that for all t+t\in\mathbb{N}_{+}, Ht1Ht1(1+ηt)IH_{t-1}H_{t}^{-1}\succ(1+\eta_{t})I. Therefore, we have

Ht11/2Ht1/2\displaystyle\left\|H_{t-1}^{1/2}H_{t}^{-1/2}\right\| (1+ηt)1/21+ηt/2eηt/2.\displaystyle\leq(1+\eta_{t})^{1/2}\leq 1+\eta_{t}/2\leq e^{\eta_{t}/2}.

Hence with the fact that HtH_{t}’s are symmetric,

Ht1/2Ht11/2\displaystyle\left\|H_{t}^{-1/2}H_{t-1}^{1/2}\right\| {ecβθ^tθ^t1F2,θ^tθ^t1Fϵhotherwise.\displaystyle\leq\begin{cases}e^{\frac{c_{*}\beta_{*}\left\|\widehat{\theta}_{t}-\widehat{\theta}_{t-1}\right\|_{F}}{2}},&\left\|\widehat{\theta}_{t}-\widehat{\theta}_{t-1}\right\|_{F}\leq\epsilon_{*}\\ h_{*}&\mbox{otherwise}.\end{cases} (18)

Bounding (c). Lemma 4 implies that if θ^tθ^t1Fϵ\left\|\widehat{\theta}_{t}-\widehat{\theta}_{t-1}\right\|_{F}\leq\epsilon_{*} then

(A^t+B^tK^t1)Ht(A^t+B^tK^t1)(112Pt1)Ht.\left(\widehat{A}_{t}+\widehat{B}_{t}\widehat{K}_{t-1}\right)^{\top}H_{t}\left(\widehat{A}_{t}+\widehat{B}_{t}\widehat{K}_{t-1}\right)\preceq\left(1-\frac{1}{2}\left\|P_{t}\right\|^{-1}\right)H_{t}.

This in turn implies that

LtLt\displaystyle L_{t}^{\top}L_{t} =Ht1/2(A^t+B^tK^t1)Ht(A^t+B^tK^t1)Ht1/2\displaystyle=H_{t}^{-1/2}(\widehat{A}_{t}+\widehat{B}_{t}\widehat{K}_{t-1})^{\top}H_{t}(\widehat{A}_{t}+\widehat{B}_{t}\widehat{K}_{t-1})H_{t}^{-1/2}
Ht1/2(112Pt1)HtHt1/2\displaystyle\preceq H_{t}^{-1/2}\left(1-\frac{1}{2}\left\|P_{t}\right\|^{-1}\right)H_{t}H_{t}^{-1/2}
(112Pt1)In.\displaystyle\preceq\left(1-\frac{1}{2}\left\|P_{t}\right\|^{-1}\right)I_{n}.

This in turn implies that Lt(112P)1/2\left\|L_{t}\right\|\leq\left(1-\frac{1}{2\left\|P_{*}\right\|}\right)^{1/2}. To summarize,

Lt\displaystyle\left\|L_{t}\right\| {ρL:=(112P)1/2<1,θ^tθ^t1Fϵotherwise,\displaystyle\leq\begin{cases}\rho_{L}:=\left(1-\frac{1}{2\left\|P_{*}\right\|}\right)^{1/2}<1,&\left\|\widehat{\theta}_{t}-\widehat{\theta}_{t-1}\right\|_{F}\leq\epsilon_{*}\\ \ell_{*}&\mbox{otherwise},\end{cases} (19)

for some constant \ell_{*} such that for all t+t\in\mathbb{N}_{+},

Ht1/2(A^t+B^tK^t1)Ht1/2\left\|H_{t}^{1/2}(\widehat{A}_{t}+\widehat{B}_{t}\widehat{K}_{t-1})H_{t}^{-1/2}\right\|\leq\ell_{*}

Combining (a,b,c). We now plug in the bounds (18) and (19) into (17). Let Δ^[s,e]:=τ=s+1eθ^τθ^τ1F\widehat{\Delta}_{[s,e]}:=\sum_{\tau=s+1}^{e}\left\|\widehat{\theta}_{\tau}-\widehat{\theta}_{\tau-1}\right\|_{F} be the partial-path movement of the selected hypothesis models and Δ[s,e]:=τ=s+1eθτθτ1F{\Delta}_{[s,e]}:=\sum_{\tau=s+1}^{e}\left\|\theta_{\tau}-\theta_{\tau-1}\right\|_{F} be the true model partial-path variation. We also denote by ns,tn_{s,t} the number of pairs (τ,τ1)(\tau,\tau-1) with s+1τt1s+1\leq\tau\leq t-1 where θ^τθ^τ1F>ϵ\left\|\widehat{\theta}_{\tau}-\widehat{\theta}_{\tau-1}\right\|_{F}>\epsilon_{*}. Note that ns,tΔ^[s,t1]/ϵn_{s,t}\leq\widehat{\Delta}_{[s,t-1]}/\epsilon_{*}. Therefore,

τ[t1:s+1](A^τ+B^τK^τ1)\displaystyle\left\|\prod_{\tau\in[t-1:s+1]}\left(\widehat{A}_{\tau}+\widehat{B}_{\tau}\widehat{K}_{\tau-1}\right)\right\|
hhns,tecβΔ^[s+1,t1]2ns,tρLts1ns,t\displaystyle\leq h_{*}\cdot h_{*}^{n_{s,t}}\cdot e^{\frac{c_{*}\beta_{*}\widehat{\Delta}_{[s+1,t-1]}}{2}}\cdot\ell_{*}^{n_{s,t}}\cdot\rho_{L}^{t-s-1-n_{s,t}}
h(hρL)Δ^[s,t1]ϵecβΔ^[s+1,t1]2ρLts1\displaystyle\leq h_{*}\left(\frac{\ell_{*}h_{*}}{\rho_{L}}\right)^{\frac{\widehat{\Delta}_{[s,t-1]}}{\epsilon_{*}}}e^{\frac{c_{*}\beta_{*}\widehat{\Delta}_{[s+1,t-1]}}{2}}\cdot\rho_{L}^{t-s-1}
h(hρL)n¯(𝖽𝗂𝖺(Θ)+2κ+Δ[s,t1])ϵecβn¯(𝖽𝗂𝖺(Θ)+2κ+Δ[s+1,t1])2ρLts1\displaystyle\leq h_{*}\left(\frac{\ell_{*}h_{*}}{\rho_{L}}\right)^{\frac{\bar{n}\left(\mathsf{dia}(\Theta)+2\kappa+{\Delta}_{[s,t-1]}\right)}{\epsilon_{*}}}\cdot e^{\frac{c_{*}\beta_{*}\bar{n}\left(\mathsf{dia}(\Theta)+2\kappa+{\Delta}_{[s+1,t-1]}\right)}{2}}\cdot\rho_{L}^{t-s-1}
=:c0c1Δ[s,t1]ρLts1.\displaystyle=:c_{0}\cdot c_{1}^{\Delta_{[s,t-1]}}\rho_{L}^{t-s-1}.

where n¯:=n(n+m)\bar{n}:=n(n+m) is the dimension of the parameter space for [AtBt][A_{t}\ B_{t}]. Finally plugging the above in (16) gives

xt\displaystyle\left\|x_{t}\right\| W(1+c0s=0t2c1Δ[s,t1]ρLts1).\displaystyle\leq W\left(1+c_{0}\sum_{s=0}^{t-2}c_{1}^{\Delta_{[s,t-1]}}\rho_{L}^{t-s-1}\right).

\blacksquare