This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Solving Reach- and Stabilize-Avoid Problems Using Discounted Reachability

Boyang Li*, , Zheng Gong*, , and Sylvia Herbert *Both authors contributed equally to this work. All authors are in Mechanical and Aerospace Engineering at UC San Diego (e-mail: {bol025, zhgong, sherbert}@ucsd.edu.)
(October 2024)
Abstract

In this article, we consider the infinite-horizon reach‐avoid (RA) and stabilize‐avoid (SA) zero-sum game problems for general nonlinear continuous-time systems, where the goal is to find the set of states that can be controlled to reach or stabilize to a target set, without violating constraints even under the worst-case disturbance. Based on the Hamilton-Jacobi reachability method, we address the RA problem by designing a new Lipschitz continuous RA value function, whose zero sublevel set exactly characterizes the RA set. We establish that the associated Bellman backup operator is contractive and that the RA value function is the unique viscosity solution of a Hamilton–Jacobi variational inequality. Finally, we develop a two-step framework for the SA problem by integrating our RA strategies with a recently proposed Robust Control Lyapunov-Value Function, thereby ensuring both target reachability and long-term stability. We numerically verify our RA and SA frameworks on a 3D Dubins car system to demonstrate the efficacy of the proposed approach.

I INTRODUCTION

Reach-Avoid (RA) problems have gained significant attention in recent years due to their broad range of applications in engineering, especially in controlling systems where safety or strategic decision-making is critical. It is characterized by a two-player zero-sum game, where the control as one of the players aims to steer the system into a target set while avoiding an unsafe set, and the disturbance as another player strives to prevent the first from succeeding.

The finite-horizon RA game aims to identify the set of initial states (called the RA set) and the corresponding control law that can achieve such goals, i.e., reaching and avoiding, in a predefined finite time horizon. For continuous-time systems, Hamilton-Jacobi (HJ) reachability analysis is a widely used approach to solve such problems [1, 2, 3, 4]. This method allows for the design of a value function whose gradients can be used for control synthesis, and the sign of the value function at a given state characterizes the safety and performance of that state. The computation of this value function involves solving a time-dependent HJ partial differential equation [5], which typically relies on the value iteration that recursively applies a Bellman backup operator until convergence.

However, finite-horizon formulations have a major drawback. Since the value function inherently depends on time, the associated Bellman backup operator changes with the remaining time-to-go and is not a contraction mapping. As a result, the convergence of value iteration becomes sensitive to initialization, and correct initialization is required to ensure convergence to the correct solution. In contrast, a contraction mapping guarantees convergence from any arbitrary initialization and can even accelerate the computation when a good initialization is used, which are the theoretical pivots for methods like warm-starting [6] and reinforcement learning [7] that aim for efficient computation and scalability.

This limitation motivates the development of the infinite-horizon RA zero-sum game. For discrete-time systems, prior work leverages reinforcement learning to approximate RA value functions, where the convergence is guaranteed by designing a contractive Bellman backup operator that is induced from a discount factor [8, 9, 10, 11, 12, 13]. However, these works fail to provide deterministic safety guarantees or yield conservative RA sets in practice, although point-wise guarantees can be achieved during runtime by safety filtering for individual states [11, 10, 12, 13]. By designing a new discounted RA value function and a set-based certification method, [14] addresses these issues.

On the other hand, there is no equivalent work for continuous-time systems that considers all of these settings. For example, using HJ-based methods, the authors in [2] study both the finite and infinite-horizon RA problem, with disturbances only present in the finite case; neither yields a contraction mapping. Moreover, their method relies on augmenting the state space by one more dimension, which introduces additional computational complexity that grows exponentially with problem dimensionality. The authors in [15] devise a contractive Bellman operator but only consider the reach or avoid case. Inspired by these prior works, we propose a new HJ-based method to solve the infinite-horizon RA game that yields a contractive Bellman backup operator and does not need to augment the state space.

Although reach-avoid games provide a fundamental framework for ensuring a system reaches a target while avoiding unsafe states, they inherently focus on goal achievement rather than long-term safety; there are no guarantees on remaining in the goal (or maintaining safety) once the goal has been reached or after the prescribed time horizon. To ensure that the system remains in the goal set, the notion of reach-avoid-stay (RAS) has been extensively studied, and a popular family of methods tries to find a joint Control Lyapunov Barrier Function (CLBF), though each of them faces distinct challenges. For discrete-time systems, [16, 17] resort to reinforcement learning for scalability, but neither provides verification methods for the learned RAS sets, i.e., safety and stay are not guaranteed in practice.

For continuous-time systems, [18, 19] lays the theoretical foundation for the existence of a CLBF if a given set satisfies the RAS condition. However, it assumes the system to be affine in both control and disturbance and does not provide a constructive way to find a CLBF, which is a highly difficult task for high-dimensional or complex systems due to the lack of a universal construction method. [20] attempts to address this by leveraging the fitting power of neural networks to learn CLBFs, but the training process relies on the ability to sample from the control-invariant set. However, this becomes challenging when dealing with complex nonlinear dynamics, as the control-invariant set is not explicitly known. Note that for previous work on CLBFs, reach-avoid-stay is sometimes called interchangeably with stabilize-avoid (SA), as the Lyapunov property ensures stabilization of the system.

In this paper, we propose a new HJ-based method to solve both the infinite-horizon RA and SA games for general nonlinear continuous-time systems. Specifically, our main contributions are:

  1. 1.

    We define a new RA value function VγV_{\gamma} and prove that the zero sublevel set of the RA value function is exactly the RA set.

  2. 2.

    We establish the theoretical foundation to compute VγV_{\gamma}: the dynamic programming principle (DPP), VγV_{\gamma} being a unique viscosity solution to a Hamilton-Jacobi-Isaac’s Variational Inequality (HJI-VI), and the contraction property of the Bellman backup operator associated with the DPP of VγV_{\gamma}.

  3. 3.

    We present a two-step framework to construct the SA value function, and show that the zero sublevel set of the SA value function fully recovers the desired SA set.

  4. 4.

    We show how to synthesize the controllers for the RA and SA tasks.

To the best of our knowledge, both the proposed RA and SA frameworks are the first of their kind to solve the respective infinite-horizon games for general nonlinear continuous-time systems.

The rest of this article is organized as follows. In Section II, we define the RA and SA problems and introduce the concept of Robust Control Lyapunov-Value Functions (R-CLVF) for stabilization. In Section III, we introduce the value function VγV_{\gamma} to solve the RA game, and discuss its HJ characterizations. In Section IV, we integrate our RA formulation with R-CLVF and propose a two-step framework to solve the SA game.

II BACKGROUND

In this section, we first define the system of interest alongside the RA and SA problems.

II-A Dynamic System

We consider a general nonlinear time-invariant system

x˙(t)=f(x(t),u(t),d(t)),for t0,x(0)=x0.\displaystyle\dot{x}(t)=f(x(t),\mathrm{u}(t),\mathrm{d}(t)),\hskip 8.53581pt\text{for }t\geq 0,\hskip 8.53581ptx(0)=x_{0}. (1)

The control signal u:[0,)𝒰\mathrm{u}\colon[0,\infty)\rightarrow\mathcal{U} and the disturbance signal d:[0,)𝒟\mathrm{d}\colon[0,\infty)\rightarrow\mathcal{D} are drawn from the set of Lebesgue measurable control signals: 𝕌{u:[0,)𝒰u is Lebesgue measurable.}\mathbb{U}\coloneqq\{\mathrm{u}\colon[0,\infty)\rightarrow\mathcal{U}\mid\mathrm{u}\text{ is Lebesgue measurable.}\} and the set of Lebesgue measurable disturbance signals: 𝔻{d:[0,)𝒟d is Lebesgue measurable.}\mathbb{D}\coloneqq\{\mathrm{d}\colon[0,\infty)\rightarrow\mathcal{D}\mid\mathrm{d}\text{ is Lebesgue measurable.}\}, where 𝒰mu\mathcal{U}\subset\mathbb{R}^{m_{u}} and 𝒟mp\mathcal{D}\subset\mathbb{R}^{m_{p}} are compact convex.

Assume the dynamics f:n×𝒰×𝒟nf\colon\mathbb{R}^{n}\times\mathcal{\mathcal{U}}\times\mathcal{D}\mapsto\mathbb{R}^{n} is uniformly continuous in (x,u,d)(x,u,d), Lipschitz continuous in x,u𝒰,d𝒟x,\forall u\in\mathcal{U},d\in\mathcal{D}, and bounded xn,u𝒰,d𝒟\forall x\in\mathbb{R}^{n},u\in\mathcal{U},d\in\mathcal{D}. Under these assumptions, given the initial state xx, control and disturbance signal u,d\mathrm{u},\mathrm{d}, there exists a unique solution ξxu,d(t)\xi^{\mathrm{u},\mathrm{d}}_{x}(t) of the system (1). When the initial condition, control, and disturbance signal used are not important, we use ξ(t)\xi(t) to denote the solution, which is also called the trajectory. Further assume the disturbance signal can be determined as a strategy with respect to the control signal: λ:𝕌𝔻\lambda\colon\mathbb{U}\mapsto\mathbb{D}, drawn from the set of non-anticipative maps λΛ\lambda\in\Lambda. With this non-anticipative assumption, the disturbance has an instantaneous advantage over the control signal.

II-B Reach-Avoid and Stabilize-Avoid Problems

Let the open sets 𝒯{x:(x)<0}n\mathcal{T}\coloneqq\{x\colon\ell(x)<0\}\subseteq\mathbb{R}^{n} and 𝒞{x:c(x)<0}n\mathcal{C}\coloneqq\{x\colon c(x)<0\}\subseteq\mathbb{R}^{n} denote the target set and constraint set, which are characterized by some Lipschitz continuous, bounded cost function (x)\ell(x) and constraint function c(x)c(x), respectively. In this paper, we solve the following problems:

Problem 1: Specify a set of states that can be controlled to the target set safely in finite time (i.e., exists some finite T0T\geq 0 s.t. ξxu,λ[u](T)𝒯\xi_{x}^{\mathrm{u},\lambda[\mathrm{u}]}(T)\in\mathcal{T}), under the worst-case disturbance. We call this set the RA set:

𝒜(𝒯,𝒞)\displaystyle\mathcal{R}\mathcal{A}(\mathcal{T},\mathcal{C})\coloneqq {xn:λΛ,u𝕌 and T0,\displaystyle\bigl{\{}x\in\mathbb{R}^{n}\colon\forall\lambda\in\Lambda,\exists\mathrm{u}\in\mathbb{U}\text{ and }T\geq 0,
s.t., t[0,T],ξxu,λ[u](t)𝒞 and ξxu,λ[u](T)𝒯}.\displaystyle\forall t\in[0,T],\xi_{x}^{\mathrm{u},\lambda[\mathrm{u}]}(t)\in\mathcal{C}\text{ and }\xi_{x}^{\mathrm{u},\lambda[\mathrm{u}]}(T)\in\mathcal{T}\bigr{\}}.

Note that though an individual state has to reach the target while satisfying constraints in a finite time, we do not require the existence of a prescribed finite TT that works for all states in the RA set.

Problem 2: Specify a set of states that can be stabilized safely to a control invariant set in 𝒯\mathcal{T}, under the worst-case disturbance. We call this set the SA set:

𝒮𝒜(𝒯,𝒞)\displaystyle\mathcal{S}\mathcal{A}(\mathcal{T},\mathcal{C})\coloneqq {xn:λΛ,u𝕌 and T0,\displaystyle\bigl{\{}x\in\mathbb{R}^{n}\colon\forall\lambda\in\Lambda,\exists\mathrm{u}\in\mathbb{U}\text{ and }T\geq 0,
s.t., t0,ξxu,λ[u](t)𝒞 and ξxu,λ[u](T)𝒯,\displaystyle\forall t\geq 0,\xi_{x}^{\mathrm{u},\lambda[\mathrm{u}]}(t)\in\mathcal{C}\text{ and }\xi_{x}^{\mathrm{u},\lambda[\mathrm{u}]}(T)\in\mathcal{T},
limtminymξxu,λ[u](t)y=0},\displaystyle\lim_{t\rightarrow\infty}\min_{y\in\partial\mathcal{I}_{\text{m}}}\|\xi_{x}^{\mathrm{u},\lambda[\mathrm{u}]}(t)-y\|=0\bigr{\}},

where m\mathcal{I}_{\text{m}} is the smallest robustly control invariant set defined below, after we introduce the R-CLVF. The finite-horizon RA problem can be solved by HJ reachability if a finite time horizon TT is specified, while the infinite-horizon RA and SA problems remain open.

To solve the SA problem, we will apply the R-CLVF [21]:

Definition 1.

R-CLVF VγCLVF:DγV_{\gamma}^{\text{CLVF}}:D_{\gamma}\mapsto\mathbb{R} of (1) is

VγCLVF(x;p)=limtsupλΛinfu𝕌{sups[0,t]eγsr(ξ(s);p)}.\displaystyle V_{\gamma}^{\text{CLVF}}(x;p)=\lim_{t\rightarrow\infty}\sup_{\lambda\in\Lambda}\inf_{\mathrm{u}\in\mathbb{U}}\{\sup_{s\in[0,t]}e^{\gamma s}r(\xi(s);p)\}.

Here, DγnD_{\gamma}\subseteq\mathbb{R}^{n} is the domain, γ0\gamma\geq 0 is a user-specified parameter that represents the desired decay rate, r(x;p)=xpar(x;p)=\|x-p\|-a, where pp is the desired point that we want to stabilize to, and aa is a parameter that depends on the system dynamics and will be explained later.

When γ=0\gamma=0 and r(x;p)=xpr(x;p)=\|x-p\|, the value VγCLVF(x)V_{\gamma}^{\text{CLVF}}(x) represents the largest deviation from pp along the trajectory starting from xx. The level set corresponding to the smallest value Hm=minxVγCLVF(x;p)H^{\infty}_{m}=\min_{x}V_{\gamma}^{\text{CLVF}}(x;p) is the smallest robustly control invariant set (SRCIS) m\mathcal{I_{\text{m}}} of pp, defined in [21].

When γ>0\gamma>0 and r(x;p)=xpHmr(x;p)=\|x-p\|-H^{\infty}_{m} (i.e., a=Hma=H^{\infty}_{m}), the R-CLVF value VγCLVF(x)V_{\gamma}^{\text{CLVF}}(x) captures the largest exponentially amplified deviation of a trajectory starting from xx to the m\mathcal{I}_{\text{m}}, under worst-case disturbance. If this value is finite, it means xx can be exponentially stabilized to m\mathcal{I_{\text{m}}} (Lem. 7 of [21]).

Theorem 1.

The relative state can be exponentially stabilized to the m\mathcal{I}_{\text{m}} from 𝒟γm\mathcal{D}_{\gamma}\setminus\mathcal{I}_{\text{m}}, if the R-CLVF exists in 𝒟γ\mathcal{D}_{\gamma}, i.e., k>0\exists k>0, t0\forall t\geq 0

minamξ(t)akeγtminamxa.\displaystyle\min_{a\in\partial\mathcal{I}_{\text{m}}}\|\xi(t)-a\|\leq ke^{-\gamma t}\min_{a\in\partial\mathcal{I}_{\text{m}}}\|x-a\|. (2)

For conciseness, we simplify r(x;p)r(x;p) and VγCLVF(x;p)V_{\gamma}^{\text{CLVF}}(x;p) to r(x)r(x) and VγCLVF(x)V_{\gamma}^{\text{CLVF}}(x), as pp is a hyperparameter. The R-CLVF can be computed by solving the following R-CLVF-VI until convergence

0=\displaystyle 0= max{r(x)VγCLVF(x),\displaystyle\max\{r(x)-V_{\gamma}^{\text{CLVF}}(x),
maxd𝒟minu𝒰DxVγCLVF(x)f(x,u,d)+γVγCLVF(x)}.\displaystyle\max_{d\in\mathcal{D}}\min_{u\in\mathcal{U}}D_{x}V_{\gamma}^{\text{CLVF}}(x)\cdot f(x,u,d)+\gamma V_{\gamma}^{\text{CLVF}}(x)\}.

The R-CLVF optimal controller is

πH=argmaxd𝒟minu𝒰DxVγCLVF(x)f(x,u,d).\displaystyle\pi_{H}=\operatorname*{arg\,max}_{d\in\mathcal{D}}\min_{\mathrm{u}\in\mathcal{U}}D_{x}V_{\gamma}^{\text{CLVF}}(x)\cdot f(x,u,d). (3)

The benefit of R-CLVF is two-fold: 1) for each positive level set, it is guaranteed that

maxd𝒟minu𝒰DxVγCLVF(x)f(x,u,d)γVγCLVF(x)<0,\max_{d\in\mathcal{D}}\min_{u\in\mathcal{U}}D_{x}V_{\gamma}^{\text{CLVF}}(x)\cdot f(x,u,d)\leq-\gamma V_{\gamma}^{\text{CLVF}}(x)<0,

and therefore each positive level set is attractive. 2) For any γ0\gamma\geq 0, the zero sublevel set of the R-CLVF is the largest robustly control invariant subset of the zero sub-level set of r(x)r(x).

III A NEW DISCOUNTED RA VALUE FUNCTION

In this section, we propose a discounted RA value function to solve the RA and SA problems, i.e., characterizing 𝒜(𝒯,𝒞)\mathcal{R}\mathcal{A}(\mathcal{T},\mathcal{C}) and 𝒮𝒜(𝒯,𝒞)\mathcal{S}\mathcal{A}(\mathcal{T},\mathcal{C}). The introduction of the discount factor is necessary to ensure the value function is the unique viscosity solution of the corresponding HJI-VI, and many other desirable properties, such as the Lipschitz continuity and the contraction of the associated Bellman backup operator, also arise from the discount factor.

III-A Definition and Properties

Definition 2.

A time-discounted RA value function Vγ(x):nV_{\gamma}(x):\mathbb{R}^{n}\mapsto\mathbb{R} is defined as

Vγ(x)\displaystyle V_{\gamma}(x) supλΛinfu𝕌inft[0,)\displaystyle\coloneqq\sup_{\lambda\in\Lambda}\inf_{\mathrm{u}\in\mathbb{U}}\inf_{t\in[0,\infty)}
max{eγt(ξ(t)),maxs[0,t]eγsc(ξ(s))},\displaystyle\max\bigl{\{}e^{-\gamma t}\ell(\xi(t)),\max_{s\in[0,t]}e^{-\gamma s}c(\xi(s))\bigr{\}}, (4)

where ξ\xi solves (1) with initial state xx and γ>0\gamma>0. Intuitively, the control seeks to decrease the value along the trajectory, while the disturbance seeks to increase it. The infimum over time captures whether the trajectory ever reaches the target set 𝒯\mathcal{T} and the maximum checks if it ever violates the safety constraints. That is, Vγ(x)<0V_{\gamma}(x)<0 implies that there exists a control signal u\mathrm{u} that can steer the system from initial state xx to the target without violating the constraints, even under worst-case disturbance. The following theorem proves our claim.

Proposition 1.

(Exact Recovery of RA set)

𝒜(𝒯,𝒞)={x:Vγ(x)<0}\mathcal{R}\mathcal{A}(\mathcal{T},\mathcal{C})=\{x\colon V_{\gamma}(x)<0\} (5)

Proof. We first prove the sufficiency. Suppose x𝒜(𝒯,𝒞)x\in\mathcal{R}\mathcal{A}(\mathcal{T},\mathcal{C}). That is, given arbitrary λΛ,u^𝕌 and T0\lambda\in\Lambda,\exists\hat{\mathrm{u}}\in\mathbb{U}\text{ and }T\geq 0 such that maxt[0,T]c(ξxu^,λ[u^](t))<0\max\limits_{t\in[0,T]}c(\xi^{\hat{\mathrm{u}},\lambda[\hat{\mathrm{u}}]}_{x}(t))<0 and (ξxu^,λ[u^](T))<0\ell(\xi^{\hat{\mathrm{u}},\lambda[\hat{\mathrm{u}}]}_{x}(T))<0. Since eγt>0,te^{-\gamma t}>0,\forall t\in\mathbb{R}, we have maxt[0,T]eγtc(ξxu^,λ[u^](t))<0\max\limits_{t\in[0,T]}e^{-\gamma t}c(\xi^{\hat{\mathrm{u}},\lambda[\hat{\mathrm{u}}]}_{x}(t))<0 and eγT(ξxu^,λ[u^](T))<0e^{-\gamma T}\ell(\xi^{\hat{\mathrm{u}},\lambda[\hat{\mathrm{u}}]}_{x}(T))<0. Therefore,

infu𝕌\displaystyle\inf\limits_{\mathrm{u}\in\mathbb{U}} infT[0,)\displaystyle\inf\limits_{T\in[0,\infty)}
max{eγT(ξxu,λ[u](T)),maxt[0,T]eγtc(ξxu,λ[u](t))}<0.\displaystyle\max\{e^{-\gamma T}\ell(\xi^{\mathrm{u},\lambda[\mathrm{u}]}_{x}(T)),\max\limits_{t\in[0,T]}e^{-\gamma t}c(\xi^{\mathrm{u},\lambda[\mathrm{u}]}_{x}(t))\}<0.

Because λ\lambda is arbitrary, Vγ(x)<0V_{\gamma}(x)<0.

For the necessity, let xx be such that Vγ(x)<0V_{\gamma}(x)<0, that is, for any λΛ,u^𝕌\lambda\in\Lambda,\exists\hat{\mathrm{u}}\in\mathbb{U} such that,

inft[0,)max{eγt(ξxu^,λ[u^](t)),maxs[0,t]eγtc(ξxu^,λ[u^](s))}<0.\displaystyle\inf_{t\in[0,\infty)}\max\{e^{-\gamma t}\ell(\xi^{\hat{\mathrm{u}},\lambda[\hat{\mathrm{u}}]}_{x}(t)),\max\limits_{s\in[0,t]}e^{-\gamma t}c(\xi^{\hat{\mathrm{u}},\lambda[\hat{\mathrm{u}}]}_{x}(s))\}<0.

Thus, there also exists T0T\geq 0 such that

max{eγT(ξxu^,λ[u^](T)),maxs[0,T]eγtc(ξxu^,λ[u^](s))}<0.\max\{e^{-\gamma T}\ell(\xi^{\hat{\mathrm{u}},\lambda[\hat{\mathrm{u}}]}_{x}(T)),\max\limits_{s\in[0,T]}e^{-\gamma t}c(\xi^{\hat{\mathrm{u}},\lambda[\hat{\mathrm{u}}]}_{x}(s))\}<0.

Since eγt>0e^{-\gamma t}>0, both (ξxu^,λ[u^](T))\ell(\xi^{\hat{\mathrm{u}},\lambda[\hat{\mathrm{u}}]}_{x}(T)) and maxt[0,T]c(ξxu^,λ[u^](t))\max\limits_{t\in[0,T]}c(\xi^{\hat{\mathrm{u}},\lambda[\hat{\mathrm{u}}]}_{x}(t)) are negative, which means x𝒜(𝒯,𝒞)x\in\mathcal{R}\mathcal{A}(\mathcal{T},\mathcal{C}). \blacksquare

As mentioned before, we now show how the discount factor guarantees the boundedness and Lipschitz continuity of (2).

Proposition 2.

(Boundedness and Lipschitz Continuity) VγV_{\gamma} is bounded and Lipschitz continuous in n\mathbb{R}^{n} if Lf<γL_{f}<\gamma, where LfL_{f} is the Lipschitz constant of ff.

Proof. Define

p(λ,u,x,t)=max{eγt(ξ(t)),maxs[0,t]eγsc(ξ(s))}.p(\lambda,\mathrm{u},x,t)=\max\bigl{\{}e^{-\gamma t}\ell(\xi(t)),\max_{s\in[0,t]}e^{-\gamma s}c(\xi(s))\bigr{\}}.

Take x1,x2nx_{1},x_{2}\in\mathbb{R}^{n} and any ε>0\varepsilon>0. Then, there exists λ^Λ\hat{\lambda}\in\Lambda such that for any u𝕌\mathrm{u}\in\mathbb{U},

Vγ(x1)inft[0,)p(λ^,u,x1,t)+ε.V_{\gamma}(x_{1})\leq\inf_{t\in[0,\infty)}p(\hat{\lambda},\mathrm{u},x_{1},t)+\varepsilon.

Similarly, for any λΛ\lambda\in\Lambda, there exists u^𝕌\hat{\mathrm{u}}\in\mathbb{U} such that

Vγ(x2)inft[0,)p(λ,u^,x2,t)ε.V_{\gamma}(x_{2})\geq\inf_{t\in[0,\infty)}p(\lambda,\hat{\mathrm{u}},x_{2},t)-\varepsilon.

Combining the above two inequalities, we have

Vγ(x1)Vγ(x2)\displaystyle V_{\gamma}(x_{1})-V_{\gamma}(x_{2})\leq inft[0,)p(λ^,u^,x1,t)\displaystyle\inf_{t\in[0,\infty)}p(\hat{\lambda},\hat{\mathrm{u}},x_{1},t)
inft[0,)p(λ^,u^,x2,t)+2ε.\displaystyle\quad-\inf_{t\in[0,\infty)}p(\hat{\lambda},\hat{\mathrm{u}},x_{2},t)+2\varepsilon. (6)

Moreover, there exists t^[0,)\hat{t}\in[0,\infty) such that

inft[0,)p(λ^,u^,x2,t)p(λ^,u^,x2,t^)ε,\displaystyle\inf_{t\in[0,\infty)}p(\hat{\lambda},\hat{\mathrm{u}},x_{2},t)\geq p(\hat{\lambda},\hat{\mathrm{u}},x_{2},\hat{t})-\varepsilon,
inft[0,)p(λ^,u^,x1,t)p(λ^,u^,x1,t^).\displaystyle\inf_{t\in[0,\infty)}p(\hat{\lambda},\hat{\mathrm{u}},x_{1},t)\leq p(\hat{\lambda},\hat{\mathrm{u}},x_{1},\hat{t}).

Plugging this back to (2), we have

Vγ(x1)Vγ(x2)\displaystyle V_{\gamma}(x_{1})-V_{\gamma}(x_{2})\leq inft[0,)p(λ^,u^,x1,t)\displaystyle\inf_{t\in[0,\infty)}p(\hat{\lambda},\hat{\mathrm{u}},x_{1},t)
p(λ^,u^,x2,t^)+3ε,\displaystyle\quad-p(\hat{\lambda},\hat{\mathrm{u}},x_{2},\hat{t})+3\varepsilon,
\displaystyle\leq p(λ^,u^,x1,t^)p(λ^,u^,x2,t^)+3ε.\displaystyle p(\hat{\lambda},\hat{\mathrm{u}},x_{1},\hat{t})-p(\hat{\lambda},\hat{\mathrm{u}},x_{2},\hat{t})+3\varepsilon. (7)

By definition of pp, both of the following hold:

p(λ^,u^,x2,t^)\displaystyle p(\hat{\lambda},\hat{\mathrm{u}},x_{2},\hat{t}) eγt^(ξx2u^,λ^[u^](t^)),\displaystyle\geq e^{-\gamma\hat{t}}\ell(\xi^{\hat{\mathrm{u}},\hat{\lambda}[\hat{\mathrm{u}}]}_{x_{2}}(\hat{t})), (8)
p(λ^,u^,x2,t^)\displaystyle p(\hat{\lambda},\hat{\mathrm{u}},x_{2},\hat{t}) maxs[0,t^]eγsc(ξx2u^,λ^[u^](s)),\displaystyle\geq\max_{s\in[0,\hat{t}]}e^{-\gamma s}c(\xi^{\hat{\mathrm{u}},\hat{\lambda}[\hat{\mathrm{u}}]}_{x_{2}}(s)),
eγsc(ξx2u^,λ^[u^](s)),\displaystyle\geq e^{-\gamma s}c(\xi^{\hat{\mathrm{u}},\hat{\lambda}[\hat{\mathrm{u}}]}_{x_{2}}(s)), (9)

for any s[0,t^]s\in[0,\hat{t}]. Similarly, either of the following holds:

p(λ^,u^,x1,t^)\displaystyle p(\hat{\lambda},\hat{\mathrm{u}},x_{1},\hat{t}) eγt^(ξx1u^,λ^[u^](t^)),\displaystyle\leq e^{-\gamma\hat{t}}\ell(\xi^{\hat{\mathrm{u}},\hat{\lambda}[\hat{\mathrm{u}}]}_{x_{1}}(\hat{t})), (10)
p(λ^,u^,x1,t^)\displaystyle p(\hat{\lambda},\hat{\mathrm{u}},x_{1},\hat{t}) maxs[0,t^]eγsc(ξx1u^,λ^[u^](s)),\displaystyle\leq\max_{s\in[0,\hat{t}]}e^{-\gamma s}c(\xi^{\hat{\mathrm{u}},\hat{\lambda}[\hat{\mathrm{u}}]}_{x_{1}}(s)),
eγs^c(ξx1u^,λ^[u^](s^)),\displaystyle\leq e^{-\gamma\hat{s}}c(\xi^{\hat{\mathrm{u}},\hat{\lambda}[\hat{\mathrm{u}}]}_{x_{1}}(\hat{s})), (11)

for some s^[0,t^]\hat{s}\in[0,\hat{t}]. Finally, plugging (8) and (10) back to (2), we have

Vγ(x1)Vγ(x2)\displaystyle V_{\gamma}(x_{1})-V_{\gamma}(x_{2}) eγt^(ξx1u^,λ^[u^](t^))\displaystyle\leq e^{-\gamma\hat{t}}\ell(\xi^{\hat{\mathrm{u}},\hat{\lambda}[\hat{\mathrm{u}}]}_{x_{1}}(\hat{t}))
eγt^(ξx2u^,λ^[u^](t^))+3ε,\displaystyle\qquad-e^{-\gamma\hat{t}}\ell(\xi^{\hat{\mathrm{u}},\hat{\lambda}[\hat{\mathrm{u}}]}_{x_{2}}(\hat{t}))+3\varepsilon,
Leγt^eLft^x1x2+3ε,\displaystyle\leq L_{\ell}e^{-\gamma\hat{t}}e^{L_{f}\hat{t}}||x_{1}-x_{2}||+3\varepsilon,
Lx1x2+3ε,\displaystyle\leq L_{\ell}||x_{1}-x_{2}||+3\varepsilon, (12)

where L,LfL_{\ell},L_{f} are the Lipschitz constants of \ell and ff, respectively. The second inequality is a result of Gronwall’s inequality, and the third is due to the assumption that Lf<γL_{f}<\gamma. Similarly, we can show that Vγ(x2)Vγ(x1)Lx1x2+3εV_{\gamma}(x_{2})-V_{\gamma}(x_{1})\leq L_{\ell}||x_{1}-x_{2}||+3\varepsilon. Thus, we have |Vγ(x1)Vγ(x2)|Lx1x2|V_{\gamma}(x_{1})-V_{\gamma}(x_{2})|\leq L_{\ell}||x_{1}-x_{2}||. The proof for plugging (9) and (11) to (2) is similar.

The boundedness of VγV_{\gamma} follows from that of (x)\ell(x) and c(x)c(x), and the fact that limteγt=0\lim_{t\to\infty}e^{-\gamma t}=0. \blacksquare

Note that since the value function is Lipschitz continuous, it is differentiable almost everywhere by Rademacher’s Theorem [22, Ch.5.8.3].

III-B Hamilton-Jacobi Characertization of VγV_{\gamma}

We now establish the theoretical foundations to compute VγV_{\gamma}: the DPP as a result of Bellman’s optimality principle, VγV_{\gamma} being a unique viscosity solution to a HJI-VI, and the contraction property of the Bellman backup operator associated with the DPP of VγV_{\gamma}. These provide two approaches for the numerical computation. The first is to follow the procedure in [3], i.e., solving the HJI-VI and combining the DPP until convergence. The other is to do a value iteration based on the Bellman backup operator. In this paper, all numerical solutions are obtained following the first approach.

We first show the DPP, as it is the basis for proving the viscosity solution and deriving the Bellman backup operator.

Theorem 2.

(Dynamic Programming principle). Suppose γ>0\gamma>0. For xn and T>0x\in\mathbb{R}^{n}\text{ and }T>0,

Vγ(x)\displaystyle V_{\gamma}(x) =supλΛinfu𝕌min{\displaystyle=\sup_{\lambda\in\Lambda}\inf_{\mathrm{u}\in\mathbb{U}}\min\bigl{\{}
mint[0,T]max[eγt(ξ(t)),maxs[0,t]eγsc(ξ(s))],\displaystyle\min_{t\in[0,T]}\max[e^{-\gamma t}\ell(\xi(t)),\max_{s\in[0,t]}e^{-\gamma s}c(\xi(s))],
max[eγTVγ(ξ(T)),maxt[0,T]eγtc(ξ(t))]}.\displaystyle\max[e^{-\gamma T}V_{\gamma}(\xi(T)),\max_{t\in[0,T]}e^{-\gamma t}c(\xi(t))]\bigr{\}}. (13)

Proof. By definition of Vγ(x)V_{\gamma}(x),

Vγ(x)\displaystyle V_{\gamma}(x) =supλΛinfu𝕌min{\displaystyle=\sup_{\lambda\in\Lambda}\inf_{\mathrm{u}\in\mathbb{U}}\min\bigl{\{}
mint[0,T]max[eγt(ξ(t)),maxs[0,t]eγsc(ξ(s))],\displaystyle\min_{t\in[0,T]}\max[e^{-\gamma t}\ell(\xi(t)),\max_{s\in[0,t]}e^{-\gamma s}c(\xi(s))],
inft[T,)max[eγt(ξ(t)),maxs[0,t]eγsc(ξ(s))]}.\displaystyle\inf_{t\in[T,\infty)}\max[e^{-\gamma t}\ell(\xi(t)),\max_{s\in[0,t]}e^{-\gamma s}c(\xi(s))]\bigr{\}}. (14)

Thus, it suffices to prove that the optimized last term (for both control and disturbance) in the minimum of (2) is identical to that of (2). Due to the time-invariance of the system (1), we can transform

inft[T,)max[eγt(ξ(t)),maxs[0,t]eγsc(ξ(s))]\displaystyle\inf_{t\in[T,\infty)}\max[e^{-\gamma t}\ell(\xi(t)),\max_{s\in[0,t]}e^{-\gamma s}c(\xi(s))]
=\displaystyle= inft[T,)max[eγt(ξ(t)),maxs[T,t]eγsc(ξ(s)),\displaystyle\inf_{t\in[T,\infty)}\max[e^{-\gamma t}\ell(\xi(t)),\max_{s\in[T,t]}e^{-\gamma s}c(\xi(s)),
maxs[0,T]eγsc(ξ(s))],\displaystyle\max_{s\in[0,T]}e^{-\gamma s}c(\xi(s))],
=\displaystyle= inft[T,)max{max[eγt(ξ(t)),maxs[T,t]eγsc(ξ(s))],\displaystyle\inf_{t\in[T,\infty)}\max\bigl{\{}\max[e^{-\gamma t}\ell(\xi(t)),\max_{s\in[T,t]}e^{-\gamma s}c(\xi(s))],
maxs[0,T]eγsc(ξ(s))}.\displaystyle\max_{s\in[0,T]}e^{-\gamma s}c(\xi(s))\bigr{\}}.

Taking ktTk\coloneqq t-T and yξxu,d(T)y\coloneqq\xi^{\mathrm{u},\mathrm{d}}_{x}(T), we get

infk[0,)max{max[eγ(k+T)(ξyu,d(k)),\displaystyle\inf_{k\in[0,\infty)}\max\bigl{\{}\max[e^{-\gamma(k+T)}\ell(\xi^{\mathrm{u},\mathrm{d}}_{y}(k)),
maxs[0,k]eγ(s+T)c(ξyu,d(k))],maxs[0,T]eγsc(ξ(s))}.\displaystyle\max_{s\in[0,k]}e^{-\gamma(s+T)}c(\xi^{\mathrm{u},\mathrm{d}}_{y}(k))],\max_{s\in[0,T]}e^{-\gamma s}c(\xi(s))\bigr{\}}.
=\displaystyle= max{infk[0,)max[eγ(k+T)(ξyu,d(k)),\displaystyle\max\bigl{\{}\inf_{k\in[0,\infty)}\max[e^{-\gamma(k+T)}\ell(\xi^{\mathrm{u},\mathrm{d}}_{y}(k)),
maxs[0,k]eγ(s+T)c(ξyu,d(k))],maxs[0,T]eγsc(ξ(s))},\displaystyle\max_{s\in[0,k]}e^{-\gamma(s+T)}c(\xi^{\mathrm{u},\mathrm{d}}_{y}(k))],\max_{s\in[0,T]}e^{-\gamma s}c(\xi(s))\bigr{\}},

where the infimum over kk for maxs[0,T]eγsc(ξ(s))\max_{s\in[0,T]}e^{-\gamma s}c(\xi(s)) is dropped because it is independent of kk. Plugging the above into (2) and noticing that the optimization over the two intervals t[0,T]t\in[0,T] and t[T,)t\in[T,\infty) is independent, we have

Vγ(x)=\displaystyle V_{\gamma}(x)= supλΛinfu𝕌min{\displaystyle\sup_{\lambda\in\Lambda}\inf_{\mathrm{u}\in\mathbb{U}}\min\biggl{\{}
mint[0,T]max[eγt(ξ(t)),maxs[0,t]eγsc(ξ(s))],\displaystyle\min_{t\in[0,T]}\max[e^{-\gamma t}\ell(\xi(t)),\max_{s\in[0,t]}e^{-\gamma s}c(\xi(s))],
max{infk[0,)max[eγ(k+T)(ξyu,d(k)),\displaystyle\max\bigl{\{}\inf_{k\in[0,\infty)}\max[e^{-\gamma(k+T)}\ell(\xi^{\mathrm{u},\mathrm{d}}_{y}(k)),
maxs[0,k]eγ(s+T)c(ξyu,d(k))],maxs[0,T]eγsc(ξ(s))}},\displaystyle\max_{s\in[0,k]}e^{-\gamma(s+T)}c(\xi^{\mathrm{u},\mathrm{d}}_{y}(k))],\max_{s\in[0,T]}e^{-\gamma s}c(\xi(s))\bigr{\}}\biggr{\}},
=\displaystyle= supλΛinfu𝕌min{\displaystyle\sup_{\lambda\in\Lambda}\inf_{\mathrm{u}\in\mathbb{U}}\min\biggl{\{}
mint[0,T]max[eγt(ξ(t)),maxs[0,t]eγsc(ξ(s))],\displaystyle\min_{t\in[0,T]}\max[e^{-\gamma t}\ell(\xi(t)),\max_{s\in[0,t]}e^{-\gamma s}c(\xi(s))],
max{supλΛinfu𝕌infk[0,)max[eγ(k+T)(ξyu,d(k)),\displaystyle\max\bigl{\{}\sup_{\lambda\in\Lambda}\inf_{\mathrm{u}\in\mathbb{U}}\inf_{k\in[0,\infty)}\max[e^{-\gamma(k+T)}\ell(\xi^{\mathrm{u},\mathrm{d}}_{y}(k)),
maxs[0,k]eγ(s+T)c(ξyu,d(k))],maxs[0,T]eγsc(ξ(s))}},\displaystyle\max_{s\in[0,k]}e^{-\gamma(s+T)}c(\xi^{\mathrm{u},\mathrm{d}}_{y}(k))],\max_{s\in[0,T]}e^{-\gamma s}c(\xi(s))\bigr{\}}\biggr{\}},
=\displaystyle= supλΛinfu𝕌min{\displaystyle\sup_{\lambda\in\Lambda}\inf_{\mathrm{u}\in\mathbb{U}}\min\biggl{\{}
mint[0,T]max[eγt(ξ(t)),maxs[0,t]eγsc(ξ(s))],\displaystyle\min_{t\in[0,T]}\max[e^{-\gamma t}\ell(\xi(t)),\max_{s\in[0,t]}e^{-\gamma s}c(\xi(s))],
max{eγTVγ(ξ(T)),maxt[0,T]eγtc(ξ(t))}},\displaystyle\max\{e^{-\gamma T}V_{\gamma}(\xi(T)),\max_{t\in[0,T]}e^{-\gamma t}c(\xi(t))\}\biggr{\}}, (15)

which is identical to the expression in (2). \blacksquare

Building on Theorem 2, Theorem 3 shows that (2) is the unique viscosity solution to the HJI-VI defined below. We follow the definition of the viscosity solution provided in [23].

Theorem 3.

(Unique Viscosity Solution to HJI-VI). Assume ff satisfies (1), and that (x),c(x)\ell(x),c(x) are bounded and Lipschitz continuous. Then the value function Vγ(x)V_{\gamma}(x) defined in (2) is the unique viscosity solution of the following HJI-VI

0=max{\displaystyle 0=\max\bigl{\{} min{maxd𝒟minu𝒰DxVγ(x)f(x,u,d)γVγ(x),\displaystyle\min\{\max_{d\in\mathcal{D}}\min_{u\in\mathcal{U}}D_{x}V_{\gamma}(x)\cdot f(x,u,d)-\gamma V_{\gamma}(x),
(x)Vγ(x)},c(x)Vγ(x)}.\displaystyle\ell(x)-V_{\gamma}(x)\},c(x)-V_{\gamma}(x)\bigr{\}}. (16)

Proof. The structure of the proof follows the classical approach in [24], analogously to [3].

A continuous function is a viscosity solution of a partial differential equation if it is both a subsolution and a supersolution (defined below). We will first prove that VγV_{\gamma} is a viscosity subsolution of (3). Let ψC1(n)\psi\in C^{1}(\mathbb{R}^{n}) be such that VγψV_{\gamma}-\psi attains a local maximum at x0x_{0}; without loss of generality, assume that this maximum is 0 . We say that VγV_{\gamma} is a subsolution of (3) if, for any such ψ\psi,

max{\displaystyle\max\bigl{\{} min{H(x0,Dxψ(x0))γψ(x0),\displaystyle\min\{H\left(x_{0},D_{x}\psi(x_{0})\right)-\gamma\psi(x_{0}),
(x0)ψ(x0)},c(x0)ψ(x0)}0,\displaystyle\ell(x_{0})-\psi(x_{0})\},c(x_{0})-\psi(x_{0})\bigr{\}}\geq 0, (17)

where the Hamiltonian is defined as

H(x,Dxψ(x))maxd𝒟minu𝒰Dxψ(x)f(x,u,d).H\left(x,D_{x}\psi(x)\right)\coloneqq\max_{d\in\mathcal{D}}\min_{u\in\mathcal{U}}D_{x}\psi(x)\cdot f(x,u,d).

Assuming (3) is false, then the following must hold:

c(x0)ψ(x0)ε1.c(x_{0})\leq\psi(x_{0})-\varepsilon_{1}. (18)

Moreover, at least one of the following must hold:

(x0)ψ(x0)ε2,\displaystyle\ell(x_{0})\leq\psi(x_{0})-\varepsilon_{2}, (19a)
H(x0,Dxψ(x0))γψ(x0)ε3,\displaystyle H\left(x_{0},D_{x}\psi(x_{0})\right)-\gamma\psi(x_{0})\leq-\varepsilon_{3}, (19b)

for some ε1,ε2,ε3>0\varepsilon_{1},\varepsilon_{2},\varepsilon_{3}>0. Suppose (18) and (19a) are true. We abbreviate ξx0u,d()\xi^{\mathrm{u},\mathrm{d}}_{x_{0}}(\cdot) to ξ()\xi(\cdot) whenever the statement holds for any u,d\mathrm{u},\mathrm{d}. By continuity of ,c\ell,c and system trajectories, there exists small enough δ>0\delta>0, such that for all u(),d(),τ[0,δ]\mathrm{u}(\cdot),\mathrm{d}(\cdot),\tau\in[0,\delta],

eγτc(ξ(τ))ψ(x0)ε12=Vγ(x0)ε12,\displaystyle e^{-\gamma\tau}c(\xi(\tau))\leq\psi(x_{0})-\frac{\varepsilon_{1}}{2}=V_{\gamma}(x_{0})-\frac{\varepsilon_{1}}{2}, (20a)
eγτ(ξ(τ))(x0)ε22=Vγ(x0)ε22.\displaystyle e^{-\gamma\tau}\ell(\xi(\tau))\leq\ell(x_{0})-\frac{\varepsilon_{2}}{2}=V_{\gamma}(x_{0})-\frac{\varepsilon_{2}}{2}. (20b)

Incorporating (20) into the dynamic programming principle (2), we have

Vγ(x0)\displaystyle V_{\gamma}(x_{0}) supλΛinfu𝕌min{\displaystyle\leq\sup_{\lambda\in\Lambda}\inf_{\mathrm{u}\in\mathbb{U}}\min\bigl{\{}
mint[0,δ]max[eγt(ξ(t)),maxs[0,t]eγsc(ξ(s))]}\displaystyle\min_{t\in[0,\delta]}\max[e^{-\gamma t}\ell(\xi(t)),\max_{s\in[0,t]}e^{-\gamma s}c(\xi(s))]\bigr{\}}
Vγ(x0)min{ε12,ε22},\displaystyle\leq V_{\gamma}(x_{0})-\min\left\{\frac{\varepsilon_{1}}{2},\frac{\varepsilon_{2}}{2}\right\},

which is a contradiction, since ε1,ε2>0\varepsilon_{1},\varepsilon_{2}>0. Now suppose (18) and (19b) are true. By definition of HH, for any λΛ\lambda\in\Lambda

infu𝕌Dxψ(x0)\displaystyle\inf_{\mathrm{u}\in\mathbb{U}}D_{x}\psi(x_{0}) f(x0,u(0),λ[u](0))γψ(x0),\displaystyle\cdot f(x_{0},\mathrm{u}(0),\lambda[\mathrm{u}](0))-\gamma\psi(x_{0}),
H(x0,Dxψ(x0))γψ(x0),\displaystyle\leq H\left(x_{0},D_{x}\psi(x_{0})\right)-\gamma\psi(x_{0}),
ε3.\displaystyle\leq-\varepsilon_{3}.

Then, for small enough δ>0\delta>0, (20a) holds and there exists u¯𝕌\bar{\mathrm{u}}\in\mathbb{U} such that

Dxψ(ξx0u¯,λ(τ))\displaystyle D_{x}\psi(\xi_{x_{0}}^{\bar{\mathrm{u}},\lambda}(\tau)) f(ξx0u¯,λ(τ),u¯(τ),λ[u¯](τ))\displaystyle\cdot f(\xi_{x_{0}}^{\bar{\mathrm{u}},\lambda}(\tau),\bar{\mathrm{u}}(\tau),\lambda[\bar{\mathrm{u}}](\tau))
γψ(ξx0u¯,λ(τ))ε32,\displaystyle-\gamma\psi(\xi_{x_{0}}^{\bar{\mathrm{u}},\lambda}(\tau))\leq-\frac{\varepsilon_{3}}{2},

for all τ[0,δ]\tau\in[0,\delta]. Following the technique used in Theorem 1.10 in Chapter VIII in [24] with some modifications, we multiply both sides of the above inequality by eγτe^{-\gamma\tau} and integrate from 0 to δ\delta to get

eγδψ(ξx0u¯,λ(δ))ψ(x0)ε32δ.\displaystyle e^{-\gamma\delta}\psi(\xi_{x_{0}}^{\bar{\mathrm{u}},\lambda}(\delta))-\psi(x_{0})\leq-\frac{\varepsilon_{3}}{2}\delta.

Recalling that VγψV_{\gamma}-\psi attains a local maximum of 0 at x0x_{0}, we have

eγδVγ(ξx0u¯,λ(δ))Vγ(x0)ε32δ.\displaystyle e^{-\gamma\delta}V_{\gamma}(\xi_{x_{0}}^{\bar{\mathrm{u}},\lambda}(\delta))\leq V_{\gamma}(x_{0})-\frac{\varepsilon_{3}}{2}\delta. (21)

Incorporating (20a) and (21) into the dynamic programming principle (2), we have

Vγ(x0)\displaystyle V_{\gamma}(x_{0}) supλΛinfu𝕌{max[eγδVγ(ξ(δ)),maxt[0,δ]eγtc(ξ(t))]},\displaystyle\leq\sup_{\lambda\in\Lambda}\inf_{\mathrm{u}\in\mathbb{U}}\bigl{\{}\max[e^{-\gamma\delta}V_{\gamma}(\xi(\delta)),\max_{t\in[0,\delta]}e^{-\gamma t}c(\xi(t))]\bigr{\}},
Vγ(x0)min{ε12,ε32δ}.\displaystyle\leq V_{\gamma}(x_{0})-\min\left\{\frac{\varepsilon_{1}}{2},\frac{\varepsilon_{3}}{2}\delta\right\}.

which is a contradiction, since ε1,ε3,δ>0\varepsilon_{1},\varepsilon_{3},\delta>0. Therefore, we conclude that (3) must be true and hence VγV_{\gamma} is indeed a subsolution of (3).

We now proceed to prove that VγV_{\gamma} is a viscosity supersolution of (3). Let ψC1(n)\psi\in C^{1}(\mathbb{R}^{n}) be such that VγψV_{\gamma}-\psi attains a local minimum at x0x_{0}; without loss of generality, assume that this minimum is 0. We say that VγV_{\gamma} is a supersolution of (3) if, for any such ψ\psi,

max{\displaystyle\max\bigl{\{} min{H(x0,Dxψ(x0))γψ(x0),\displaystyle\min\{H\left(x_{0},D_{x}\psi(x_{0})\right)-\gamma\psi(x_{0}),
(x0)ψ(x0)},c(x0)ψ(x0)}0,\displaystyle\ell(x_{0})-\psi(x_{0})\},c(x_{0})-\psi(x_{0})\bigr{\}}\leq 0, (22)

Assume (3) is false, then either it must hold that

c(x0)ψ(x0)+ε1,c(x_{0})\geq\psi(x_{0})+\varepsilon_{1}, (23)

or both of the following are true:

(x0)ψ(x0)+ε2,\displaystyle\ell(x_{0})\geq\psi(x_{0})+\varepsilon_{2}, (24a)
H(x0,Dxψ(x0))γψ(x0)2ε3,\displaystyle H\left(x_{0},D_{x}\psi(x_{0})\right)-\gamma\psi(x_{0})\geq 2\varepsilon_{3}, (24b)

for some ε1,ε2,ε3>0\varepsilon_{1},\varepsilon_{2},\varepsilon_{3}>0. Suppose (23) is true. Then, there exists small enough δ>0\delta>0, such that for all u(),d(),τ[0,δ]\mathrm{u}(\cdot),\mathrm{d}(\cdot),\tau\in[0,\delta],

eγτc(ξ(τ))ψ(x0)+ε12=Vγ(x0)+ε12.e^{-\gamma\tau}c(\xi(\tau))\geq\psi(x_{0})+\frac{\varepsilon_{1}}{2}=V_{\gamma}(x_{0})+\frac{\varepsilon_{1}}{2}. (25)

Incorporating (25) into the dynamic programming principle (2), we get

Vγ(x)supλΛinfu𝕌min{\displaystyle V_{\gamma}(x)\geq\sup_{\lambda\in\Lambda}\inf_{\mathrm{u}\in\mathbb{U}}\min\bigl{\{} mint[0,δ]maxs[0,t]eγsc(ξ(s)),\displaystyle\min_{t\in[0,\delta]}\max_{s\in[0,t]}e^{-\gamma s}c(\xi(s)),
maxt[0,δ]eγtc(ξ(t))},\displaystyle\max_{t\in[0,\delta]}e^{-\gamma t}c(\xi(t))\bigr{\}},
Vγ(x0)+ε12\displaystyle\geq V_{\gamma}(x_{0})+\frac{\varepsilon_{1}}{2} ,

which is a contradiction, as ε1>0\varepsilon_{1}>0. Now, suppose (24) holds. Then, there exists small enough δ1>0\delta_{1}>0, such that for all u(),d(),τ[0,δ1]\mathrm{u}(\cdot),\mathrm{d}(\cdot),\tau\in[0,\delta_{1}],

eγτ(ξ(τ))ψ(x0)+ε22=Vγ(x0)+ε22,e^{-\gamma\tau}\ell(\xi(\tau))\geq\psi(x_{0})+\frac{\varepsilon_{2}}{2}=V_{\gamma}(x_{0})+\frac{\varepsilon_{2}}{2}, (26)

and there exists λ¯Λ\bar{\lambda}\in\Lambda such that

ε3\displaystyle\varepsilon_{3} infu𝕌Dxψ(x0)f(x0,u(0),λ¯[u](0))γψ(x0),\displaystyle\leq\inf_{\mathrm{u}\in\mathbb{U}}D_{x}\psi(x_{0})\cdot f(x_{0},\mathrm{u}(0),\bar{\lambda}[\mathrm{u}](0))-\gamma\psi(x_{0}),
Dxψ(x0)f(x0,u(0),λ¯[u](0))γψ(x0),\displaystyle\leq D_{x}\psi(x_{0})\cdot f(x_{0},\mathrm{u}(0),\bar{\lambda}[\mathrm{u}](0))-\gamma\psi(x_{0}),

for all u𝕌\mathrm{u}\in\mathbb{U}. Then, there exists small enough δ2>0\delta_{2}>0 such that

Dxψ(ξx0u,λ¯(τ))\displaystyle D_{x}\psi(\xi_{x_{0}}^{\mathrm{u},\bar{\lambda}}(\tau)) f(ξx0u,λ¯(τ),u(τ),λ¯[u](τ))\displaystyle\cdot f(\xi_{x_{0}}^{\mathrm{u},\bar{\lambda}}(\tau),\mathrm{u}(\tau),\bar{\lambda}[\mathrm{u}](\tau))
γψ(ξx0u,λ¯(τ))ε32,\displaystyle-\gamma\psi(\xi_{x_{0}}^{\mathrm{u},\bar{\lambda}}(\tau))\geq\frac{\varepsilon_{3}}{2},

for all τ[0,δ2]\tau\in[0,\delta_{2}]. Similarly, we multiply both sides of the above inequality by eγτe^{-\gamma\tau} and integrate from 0 to δ=min{δ1,δ2}\delta=\min\{\delta_{1},\delta_{2}\} to get

eγδψ(ξx0u,λ¯(δ))ψ(x0)ε32δ.\displaystyle e^{-\gamma\delta}\psi(\xi_{x_{0}}^{\mathrm{u},\bar{\lambda}}(\delta))-\psi(x_{0})\geq\frac{\varepsilon_{3}}{2}\delta.

Recalling that VγψV_{\gamma}-\psi attains a local minimum of 0 at x0x_{0}, we have

eγδVγ(ξx0u,λ¯(δ))Vγ(x0)+ε32δ.\displaystyle e^{-\gamma\delta}V_{\gamma}(\xi_{x_{0}}^{\mathrm{u},\bar{\lambda}}(\delta))\geq V_{\gamma}(x_{0})+\frac{\varepsilon_{3}}{2}\delta. (27)

Incorporating (26) and (27) into the dynamic programming principle (2), we have

Vγ(x0)\displaystyle V_{\gamma}(x_{0}) supλΛinfu𝕌{max[eγδVγ(ξ(δ)),mint[0,δ]eγt(ξ(t))]},\displaystyle\geq\sup_{\lambda\in\Lambda}\inf_{\mathrm{u}\in\mathbb{U}}\bigl{\{}\max[e^{-\gamma\delta}V_{\gamma}(\xi(\delta)),\min_{t\in[0,\delta]}e^{-\gamma t}\ell(\xi(t))]\bigr{\}},
Vγ(x0)+max{ε22,ε32δ},\displaystyle\geq V_{\gamma}(x_{0})+\max\left\{\frac{\varepsilon_{2}}{2},\frac{\varepsilon_{3}}{2}\delta\right\},

which is a contradiction, since ε2,ε3,δ>0\varepsilon_{2},\varepsilon_{3},\delta>0. Therefore, we conclude that (3) must be true and hence VγV_{\gamma} is indeed a supersolution of (3).

For the uniqueness, we prove a comparison principle for our HJI-VI (3), using similar techniques in the classical comparison and uniqueness theorems (see Theorem 2.12 in [24]). We defer this proof to the appendix, and here show the uniqueness from another perspective: the contraction mapping. \blacksquare

In fact, the uniqueness property can be seen as a result of the contraction property of the Bellman backup associated with the dynamic programming principle of VγV_{\gamma} in (2). We define a Bellman backup operator BTB_{T} : BUC(n)BUC(n)\operatorname{BUC}\left(\mathbb{R}^{n}\right)\mapsto\operatorname{BUC}\left(\mathbb{R}^{n}\right) for T>0T>0, (where BUC(n)\operatorname{BUC}\left(\mathbb{R}^{n}\right) represents the set of bounded and uniformly continuous functions: n\mathbb{R}^{n}\mapsto\mathbb{R},) as

BT[V](x)\displaystyle B_{T}[V](x) supλΛinfu𝕌min{\displaystyle\coloneqq\sup_{\lambda\in\Lambda}\inf_{\mathrm{u}\in\mathbb{U}}\min\bigl{\{}
mint[0,T]max[eγt(ξ(t)),maxs[0,t]eγsc(ξ(s))],\displaystyle\min_{t\in[0,T]}\max[e^{-\gamma t}\ell(\xi(t)),\max_{s\in[0,t]}e^{-\gamma s}c(\xi(s))],
max[eγTV(ξ(T)),maxt[0,T]eγtc(ξ(t))]}.\displaystyle\max[e^{-\gamma T}V(\xi(T)),\max_{t\in[0,T]}e^{-\gamma t}c(\xi(t))]\bigr{\}}. (28)

With the help of the discount factor, we can show that this operator is a contraction mapping.

Theorem 4.

(Contraction mapping). For any V1,V2BUC(n)V^{1},V^{2}\in\operatorname{BUC}\left(\mathbb{R}^{n}\right),

BT[V1]BT[V2]LeγTV1V2L,\left\|B_{T}\left[V^{1}\right]-B_{T}\left[V^{2}\right]\right\|_{L^{\infty}}\leq e^{-\gamma T}\left\|V^{1}-V^{2}\right\|_{L^{\infty}}, (29)

and the RA value function VγV_{\gamma} in (2) is the unique fixed-point solution to Vγ=BT[Vγ]V_{\gamma}=B_{T}\left[V_{\gamma}\right] for each T>0T>0. Also, for any VBUC(n)V\in\operatorname{BUC}\left(\mathbb{R}^{n}\right),

limTBT[V]=Vγ.\lim_{T\rightarrow\infty}B_{T}[V]=V_{\gamma}. (30)

Proof. Define

w(λ,u,x)maxt[0,T]eγtc(ξ(t)),wi(λ,u,x)eγTVi(ξ(T))w\left(\lambda,\mathrm{u},x\right)\coloneqq\max_{t\in[0,T]}e^{-\gamma t}c(\xi(t)),\ w^{i}\left(\lambda,\mathrm{u},x\right)\coloneqq e^{-\gamma T}V^{i}(\xi(T))

for i=1,2i=1,2. Then,

BT[Vi](x)\displaystyle B_{T}[V^{i}](x) supλΛinfu𝕌min{\displaystyle\coloneqq\sup_{\lambda\in\Lambda}\inf_{\mathrm{u}\in\mathbb{U}}\min\bigl{\{}
mint[0,T]max[eγt(ξ(t)),maxs[0,t]eγsc(ξ(s))],\displaystyle\min_{t\in[0,T]}\max[e^{-\gamma t}\ell(\xi(t)),\max_{s\in[0,t]}e^{-\gamma s}c(\xi(s))],
max[wi(λ,u,x),w(λ,u,x)]}.\displaystyle\max[w^{i}\left(\lambda,\mathrm{u},x\right),w\left(\lambda,\mathrm{u},x\right)]\bigr{\}}.

Without loss of generality, let V1(x)V2(x)V^{1}(x)\geq V^{2}(x). For any ε>0\varepsilon>0, there exists λ¯\bar{\lambda}, such that

BT[V1](x)ε<\displaystyle B_{T}\left[V^{1}\right](x)-\varepsilon< infu𝕌min{max[w1(λ¯,u,x),w(λ¯,u,x)],\displaystyle\inf_{\mathrm{u}\in\mathbb{U}}\min\bigl{\{}\max[w^{1}\left(\bar{\lambda},\mathrm{u},x\right),w\left(\bar{\lambda},\mathrm{u},x\right)],
mint[0,T]\displaystyle\min_{t\in[0,T]} max[eγt(ξ(t)),maxs[0,t]eγsc(ξ(s))]},\displaystyle\max[e^{-\gamma t}\ell(\xi(t)),\max_{s\in[0,t]}e^{-\gamma s}c(\xi(s))]\bigr{\}},
<\displaystyle< min{max[w1(λ¯,u,x),w(λ¯,u,x)],\displaystyle\min\bigl{\{}\max[w^{1}\left(\bar{\lambda},\mathrm{u},x\right),w\left(\bar{\lambda},\mathrm{u},x\right)],
mint[0,T]\displaystyle\min_{t\in[0,T]} max[eγt(ξ(t)),maxs[0,t]eγsc(ξ(s))]},\displaystyle\max[e^{-\gamma t}\ell(\xi(t)),\max_{s\in[0,t]}e^{-\gamma s}c(\xi(s))]\bigr{\}},

for any u\mathrm{u}, which indicates that both of the following hold:

BT[V1](x)ε<\displaystyle B_{T}\left[V^{1}\right](x)-\varepsilon< max[w(λ¯,u,x),w1(λ¯,u,x)],\displaystyle\max[w\left(\bar{\lambda},\mathrm{u},x\right),w^{1}\left(\bar{\lambda},\mathrm{u},x\right)], (31)
BT[V1](x)ε<\displaystyle B_{T}\left[V^{1}\right](x)-\varepsilon< mint[0,T]max[eγt(ξ(t)),\displaystyle\min_{t\in[0,T]}\max[e^{-\gamma t}\ell(\xi(t)),
maxs[0,t]eγsc(ξ(s))].\displaystyle\quad\max_{s\in[0,t]}e^{-\gamma s}c(\xi(s))]. (32)

On the other hand, for any ε>0\varepsilon>0 and any λ\lambda, there exists u¯\bar{\mathrm{u}} s.t.

BT[V2](x)+ε>min{max[w2(λ,u¯,x),w(λ,u¯,x)],\displaystyle B_{T}\left[V^{2}\right](x)+\varepsilon>\min\bigl{\{}\max[w^{2}\left(\lambda,\bar{\mathrm{u}},x\right),w\left(\lambda,\bar{\mathrm{u}},x\right)],
mint[0,T]max[eγt(ξ(t)),maxs[0,t]eγsc(ξ(s))]},\displaystyle\min_{t\in[0,T]}\max[e^{-\gamma t}\ell(\xi(t)),\max_{s\in[0,t]}e^{-\gamma s}c(\xi(s))]\bigr{\}},

which indicates that either of the following holds:

BT[V2](x)+ε>\displaystyle B_{T}\left[V^{2}\right](x)+\varepsilon> max[w(λ,u¯,x),w2(λ,u¯,x)],\displaystyle\max[w\left(\lambda,\bar{\mathrm{u}},x\right),w^{2}\left(\lambda,\bar{\mathrm{u}},x\right)], (33)
BT[V2](x)+ε>\displaystyle B_{T}\left[V^{2}\right](x)+\varepsilon> mint[0,T]max[eγt(ξ(t)),\displaystyle\min_{t\in[0,T]}\max[e^{-\gamma t}\ell(\xi(t)),
maxs[0,t]eγsc(ξ(s))].\displaystyle\quad\max_{s\in[0,t]}e^{-\gamma s}c(\xi(s))]. (34)

Combining Eq.(31) and (33), we have

BT[V1](x)\displaystyle B_{T}\left[V^{1}\right](x) BT[V2](x)\displaystyle-B_{T}\left[V^{2}\right](x)
<2ε+max{w(λ¯,u¯,x)w1(λ¯,u¯,x)}\displaystyle<2\varepsilon+\max\left\{w\left(\bar{\lambda},\bar{\mathrm{u}},x\right)w^{1}\left(\bar{\lambda},\bar{\mathrm{u}},x\right)\right\}
max{w(λ¯,u¯,x),w2(λ¯,u¯,x)},\displaystyle\quad-\max\left\{w\left(\bar{\lambda},\bar{\mathrm{u}},x\right),w^{2}\left(\bar{\lambda},\bar{\mathrm{u}},x\right)\right\},
2ε+|w1(λ¯,u¯,x)w2(λ¯,u¯,x)|,\displaystyle\leq 2\varepsilon+\left|w^{1}\left(\bar{\lambda},\bar{\mathrm{u}},x\right)-w^{2}\left(\bar{\lambda},\bar{\mathrm{u}},x\right)\right|,
2ε+eγTsupxn|V1(x)V2(x)|.\displaystyle\leq 2\varepsilon+e^{-\gamma T}\sup_{x\in\mathbb{R}^{n}}\left|V^{1}(x)-V^{2}(x)\right|.

The second inequality holds since, for all a,b,c,max{a,b}a,b,c\in\mathbb{R},\mid\max\{a,b\}- max{a,c}||bc|\max\{a,c\}\left|\leq|b-c|\right.. Moreover, combining Eq.(32) and (34), we have BT[V1](x)BT[V2](x)<2εB_{T}\left[V^{1}\right](x)-B_{T}\left[V^{2}\right](x)<2\varepsilon, so the result follows. As the above inequality holds for all xnx\in\mathbb{R}^{n} and ε>0\varepsilon>0, we have

BT[V1]BT[V2]L(n)eγTV1V2L(n).\left\|B_{T}\left[V^{1}\right]-B_{T}\left[V^{2}\right]\right\|_{L^{\infty}\left(\mathbb{R}^{n}\right)}\leq e^{-\gamma T}\left\|V^{1}-V^{2}\right\|_{L^{\infty}\left(\mathbb{R}^{n}\right)}.

Since VγV_{\gamma} is a fixed-point solution for all T>0T>0, the Banach’s contraction mapping theorem [22] implies that VγV_{\gamma} is the unique fixed-point solution to BT[Vγ](x)=Vγ(x)B_{T}\left[V_{\gamma}\right](x)=V_{\gamma}(x) for all T>0T>0. In addition, we have

BT[V]VγL(n)eγTVVγL(n).\left\|B_{T}[V]-V_{\gamma}\right\|_{L^{\infty}\left(\mathbb{R}^{n}\right)}\leq e^{-\gamma T}\left\|V-V_{\gamma}\right\|_{L^{\infty}\left(\mathbb{R}^{n}\right)}.

for all VBUC(n)V\in\operatorname{BUC}\left(\mathbb{R}^{n}\right), thus we conclude (30). \blacksquare

III-C Computation of VγV_{\gamma}

Theorem 4 allows the use of various methods to compute the value function VγV_{\gamma} using the operation BT[]B_{T}[\cdot], which does not require a specific initialization. For instance, a popular method is to solve the finite-horizon HJ equation presented in the following proposition. The proof is analogous to the proof of Lemma 5 of [25].

Proposition 3.

(Finite horizon HJI-VI for the computation of VγV_{\gamma}). For a given initial value function candidate V0BUC(n)V^{0}\in\operatorname{BUC}\left(\mathbb{R}^{n}\right), and let WW : n×[0,T]\mathbb{R}^{n}\times[0,T]\mapsto\mathbb{R} be the unique viscosity solution to the following terminal-value HJI-VI

W(x,T)=max{(x),c(x),V0(x)},xn,\displaystyle W(x,T)=\max\{\ell(x),c(x),V^{0}(x)\},\forall x\in\mathbb{R}^{n}, (35)
0=max{min{DtW(x,t)+maxd𝒟minu𝒰DxW(x,t)f(x,u,d)\displaystyle 0=\max\bigl{\{}\min\{D_{t}W(x,t)+\max_{d\in\mathcal{D}}\min_{u\in\mathcal{U}}D_{x}W(x,t)\cdot f(x,u,d)
γW(x,t),(x)W(x,t)},c(x)W(x,t)},\displaystyle\qquad-\gamma W(x,t),\ell(x)-W(x,t)\},c(x)-W(x,t)\bigr{\}}, (36)

for (x,t)n×[0,T](x,t)\in\mathbb{R}^{n}\times[0,T].

Then we have: W(x,0)BT[V0](x)W(x,0)\equiv B_{T}\left[V^{0}\right](x).

In Prop. 3, any V0BUC(n)V^{0}\in\textnormal{BUC}(\mathbb{R}^{n}) works for the computation of VγV_{\gamma}; for instance, a straightforward choice of V0V^{0} can be c(x)c(x). As TT\rightarrow\infty, DtW(x,t)D_{t}W(x,t) vanishes to 0 for all xnx\in\mathbb{R}^{n}.

Prop. 3 emphasizes that given any initialization V0V^{0}, the results from the contraction mapping (Theorem 4) and from solving the terminal value HJI-VI are the same, if the terminal value of the HJI-VI satisfies  (35).

Combining Theorem 4 and Prop. 3, we have

limTBT[V0]=limTW(x,0)=Vγ(x).\displaystyle\lim_{T\rightarrow\infty}B_{T}[V^{0}]=\lim_{T\rightarrow\infty}W(x,0)=V_{\gamma}(x). (37)

The PDE (36) can be numerically solved backward in time from the terminal condition (35), by using well-established time-dependent level-set methods [26].

In addition, another line of methods enabled by Theorem 4 is based on time discretization, such as value iteration, to accurately solve for VγV_{\gamma}. The subsequent corollary of Theorem 4 establishes that value iteration, initialized with any function V0BUC(n)V^{0}\in\text{BUC}(\mathbb{R}^{n}), will converge to VγV_{\gamma} with a Q-linear convergence rate, as specified in (38). For a given time step Δt\Delta t, the semi-Lagrangian approach can be utilized to approximate the exact Bellman operator in (29), leading to a numerical approximation. Furthermore, as Δt0\Delta t\to 0, the resulting value function converges to VγV_{\gamma} [15].

Corollary 1.

(Value Iteration). Let V0BUC(n)V^{0}\in\text{BUC}(\mathbb{R}^{n}) and consider a time step Δt>0\Delta t>0. Define the sequence {Vk}k=0\{V^{k}\}_{k=0}^{\infty} iteratively as Vk:=BΔt[Vk1], for kV^{k}:=B_{\Delta t}[V^{k-1}],\text{ for }k\in\mathbb{N}. Then, the following holds:

Vk+1VγVkVγeγΔt<1,\displaystyle\frac{\|V^{k+1}-V_{\gamma}\|_{\infty}}{\|V^{k}-V_{\gamma}\|_{\infty}}\leq e^{-\gamma\Delta t}<1, (38)

which implies that limkVk=Vγ\lim_{k\to\infty}V^{k}=V_{\gamma}.

Proof. This result follows directly from Theorem 4. \blacksquare

III-D The Finite-horizon Value Function and Control Synthesis

In many HJ-reachability based works, the optimal controller can be synthesized by the gradient of the value function (see Sec I.5 in [24]). However, it should be pointed out that the controller synthesized directly from the gradient of  (2) does not guarantee finite-time reach-avoid of the target and constraint sets. Though from Prop 1, we concluded that Vγ(x)<0V_{\gamma}(x)<0 implies there exists some TT s.t. (ξ(T))<0\ell(\xi(T))<0 and c(ξ(t))<0c(\xi(t))<0 for all t[0,T]t\in[0,T], it does not mean the optimal control determined from the gradient of Vγ(x)V_{\gamma}(x) is able to drive the system to the target at exactly time TT. One reason is that HJI-VI (3) does not provide any useful information (maxdminuV˙γ=γVγ\max_{d}\min_{u}\dot{V}_{\gamma}=\gamma V_{\gamma} has nothing to do with finite-time reach avoid), and that the system can travel freely before TT, and then apply some control and reach the target.

In this section we discuss how to design a time-optimal RA controller using a finite-horizon version of the infinite RA value function discussed above.

Definition 3.

A finite-horizon RA value function W(x,t):n×[0,T]W(x,t):\mathbb{R}^{n}\times[0,T]\rightarrow\mathbb{R} is defined as

W(x,t)\displaystyle W(x,t) supλΛinfu𝕌minτ[t,T]\displaystyle\coloneqq\sup_{\lambda\in\Lambda}\inf_{\mathrm{u}\in\mathbb{U}}\min_{\tau\in[t,T]}
max{eγ(τt)(ξx,tu,λ[u](τ)),maxs[t,τ]eγ(st)c(ξx,tu,λ[u](s))}.\displaystyle\max\bigl{\{}e^{-\gamma(\tau-t)}\ell(\xi^{\mathrm{u},\lambda[\mathrm{u}]}_{x,t}(\tau)),\max_{s\in[t,\tau]}e^{-\gamma(s-t)}c(\xi^{\mathrm{u},\lambda[\mathrm{u}]}_{x,t}(s))\bigr{\}}. (39)

The zero sublevel set of W(x,t)W(x,t) characterizes a finite-horizon RA set, i.e., it is the set of initial states that can reach the target at some τ[0,t]\tau\in[0,t], and avoid the obstacle in [0,τ][0,\tau]. We state without proof that (3) satisfies the corresponding DPP and is the unique viscosity solution to the HJI-VI (35) (36), by taking V0(x)=max{(x),c(x)}V^{0}(x)=\max\{\ell(x),c(x)\}.

From the definition, for any fixed time horizon TT, W(x,0)Vγ(x)W(x,0)\geq V_{\gamma}(x). This means the zero sub-level set of W(x,0)W(x,0) provides an under-approximation of the infinite-time RA set. If we take TT\rightarrow\infty, the finite-time RA value function at t=0t=0 is exactly the same as (2), i.e., W(x,0)=Vγ(x)W(x,0)=V_{\gamma}(x). Further, for a fixed state xx and time horizon TT, as tt decreases from TT to 0, W(x,t)W(x,t) is non-increasing, and the time (if exists) W(x,t)W(x,t) first decay to 0 is the minimal time for the trajectory starting from xx to reach the target while avoiding the obstacle. One optimal control signal along a trajectory ξ(t)\xi(t) is given by

πRA(ξ(t))=argmaxd𝒟minu𝒰DxW(ξ(t),Tt)f(ξ(t),u,d),\pi_{RA}(\xi(t))=\operatorname*{arg\,max}\limits_{d\in\mathcal{D}}\min\limits_{\mathrm{u}\in\mathcal{U}}D_{x}W(\xi(t),T-t^{*})\cdot f(\xi(t),u,d), (40)

where tmin{t:W(ξ(t),Tt)=0}t^{*}\coloneqq\min\{t:W(\xi(t),T-t)=0\}. Notice here ξ(t)\xi(t) and πRA(ξ(t))\pi_{RA}(\xi(t)) is an optimal control-trajectory pair [24].

IV TWO-STEP FORMULATION FOR STABILIZE-AVOID PROBLEM

In this section, we propose a two-step method to solve the SA problem, by combining the RA formulation (2) and the R-CLVF. We further assume (x)\ell(x) is upper bounded by r(x;p)=xpar(x;p)=\|x-p\|-a, i.e. (x)r(x;p)\ell(x)\leq r(x;p), and the zero sub-level set of r(x)r(x) contains some robust control invariant subset \mathcal{I}. With this assumption, we guarantee that there exists at least one sub-level set of the R-CLVF to be a strict subset of the target set.

The idea is straightforward: we treat one level set of the R-CLVF as the new target set. Ideally, this set should be the largest sub-level set M\mathcal{I}_{\text{M}} of the VγCLVFV_{\gamma}^{\text{CLVF}} contained in the target 𝒯\mathcal{T}. To fit in our RA framework, we define the shifted R-CLVF V¯γCLVFVγCLVFM\bar{V}_{\gamma}^{\text{CLVF}}\coloneqq V_{\gamma}^{\text{CLVF}}-M, where MM is the level of M\mathcal{I}_{\text{M}}, and we have M={x:V¯γCLVF(x)<0}\mathcal{I}_{\text{M}}=\{x\colon\bar{V}_{\gamma}^{\text{CLVF}}(x)<0\}. Now we construct a new value function

VγSA(x)\displaystyle V_{\gamma}^{\text{SA}}(x) supλΛinfu𝕌inft[0,)\displaystyle\coloneqq\sup_{\lambda\in\Lambda}\inf_{\mathrm{u}\in\mathbb{U}}\inf_{t\in[0,\infty)}
max{eγtV¯γCLVF(ξ(t)),sups[0,t]eγsc(ξ(s))},\displaystyle\max\{e^{-\gamma t}\bar{V}_{\gamma}^{\text{CLVF}}(\xi(t)),\sup_{s\in[0,t]}e^{-\gamma s}c(\xi(s))\}, (41)

where the cost function \ell in 2 is replaced by V¯γCLVF\bar{V}_{\gamma}^{\text{CLVF}}. Next we show that its zero sublevel set is the desired SA set.

Proposition 4.

(Exact Recovery of SA set)

𝒮𝒜(𝒯,𝒞)={x:VγSA(x)<0}\mathcal{S}\mathcal{A}(\mathcal{T},\mathcal{C})=\{x\colon V_{\gamma}^{\text{SA}}(x)<0\} (42)

Proof. For sufficiency, note that x𝒮𝒜(𝒯,𝒞)x\in\mathcal{S}\mathcal{A}(\mathcal{T},\mathcal{C}) implies there exists u\mathrm{u} such that for any λ\lambda, it safely steers the system to 𝒯\mathcal{T} and ultimately converge to m\mathcal{I}_{m}. Since the trajectory is continuous, it must enter M\mathcal{I}_{\text{M}}, at some time T0T\geq 0, i.e., V¯γCLVF(ξxu,λ(T))<0\bar{V}_{\gamma}^{\text{CLVF}}(\xi^{\mathrm{u},\lambda}_{x}(T))<0 and c(ξ(s))<0c(\xi(s))<0 for all s[0,T]s\in[0,T]. Since we take the infimum over [0,)[0,\infty), and at t=Tt=T, the value is already negative, we conclude that VγSA(x)<0V_{\gamma}^{\text{SA}}(x)<0.

Now suppose VγSA(x)<0V_{\gamma}^{\text{SA}}(x)<0. Then, there exist some TT, for all λ[0,T]\lambda_{[0,T]}, there exists u[0,T]\mathrm{u}_{[0,T]} s.t

max{eγTV¯γCLVF(ξ(T)),sups[0,T]eγsc(ξ(s))}<0.\displaystyle\max\{e^{-\gamma T}\bar{V}_{\gamma}^{\text{CLVF}}(\xi(T)),\sup_{s\in[0,T]}e^{-\gamma s}c(\xi(s))\}<0.

Since exponential is positive, this means V¯γCLVF(ξ(T))<0\bar{V}_{\gamma}^{\text{CLVF}}(\xi(T))<0 and c(ξ(s))<0c(\xi(s))<0 for all s[0,T]s\in[0,T]. Further, since V¯γCLVF(ξ(T))<0\bar{V}_{\gamma}^{\text{CLVF}}(\xi(T))<0, for all λ[T,)\lambda_{[T,\infty)}, there exists u[T,)\mathrm{u}_{[T,\infty)} s.t. limtminymξxu,λ[u](t)y=0\lim_{t\rightarrow\infty}\min_{y\in\partial\mathcal{I}_{\text{m}}}\|\xi_{x}^{\mathrm{u},\lambda[\mathrm{u}]}(t)-y\|=0. This means for all λ\lambda, there exists u\mathrm{u} that steers the systems to 𝒯\mathcal{T} and then safely stablize the system to m\mathcal{I_{\text{m}}}. \blacksquare

As mentioned before, the SA problem is solved in a two-step manner: we first reach one level set of the R-CLVF (M\mathcal{I}_{\text{M}}), then use the R-CLVF to stabilize the system. Therefore, one feedback controller can also be synthesized in the two-step formulation: given any initial states in the SA set, use πRA\pi_{RA} (40) in the reach-avoid phase, and πH(x)\pi_{H}(x) (3) in the stabilize-avoid phase:

π𝒮𝒜(x)={πRA(x) if V¯γCLVF>0,πH(x) otherwise .\pi_{\mathcal{SA}}(x)=\begin{cases}\pi_{RA}(x)&\text{ if }\bar{V}_{\gamma}^{\text{CLVF}}>0,\\ \pi_{H}(x)&\text{ otherwise .}\end{cases} (43)

V NUMERICAL EXAMPLES

We demonstrate that our stabilize- and reach-avoid frameworks can solve the respective problems; find 𝒮𝒜(𝒯,𝒞)\mathcal{S}\mathcal{A}(\mathcal{T},\mathcal{C}) and 𝒜(𝒯,𝒞)\mathcal{R}\mathcal{A}(\mathcal{T},\mathcal{C}) sets, stabilize or steer the system to m\mathcal{I_{\text{m}}} or 𝒯\mathcal{T}. Consider the following 3D Dubins car example

x1˙=vcosx3+d1x2˙=vcosx3+d2x3˙=u,\dot{x_{1}}=v\cos{x_{3}}+d_{1}\quad\dot{x_{2}}=v\cos{x_{3}}+d_{2}\quad\dot{x_{3}}=u,

where v=1v=1, u[π,π]u\in[-\pi,\pi], and d1,d2[0.2,0.2]d_{1},d_{2}\in[-0.2,0.2]. The target set 𝒯={x|(x)<0}\mathcal{T}=\{x|\ell(x)<0\}, where (x)=(x13.5)2+(x23.5)21\ell(x)=(x_{1}-3.5)^{2}+(x_{2}-3.5)^{2}-1, and the constraint set 𝒞={x|min(c1(x),c2(x),c3(x))<0}\mathcal{C}=\{x|\min(c_{1}(x),c_{2}(x),c_{3}(x))<0\}, where c1(x)=1(x1+2)2(x2+2)2c_{1}(x)=1-(x_{1}+2)^{2}-(x_{2}+2)^{2}, c2(x)=1(x13)2(x2+3)2c_{2}(x)=1-(x_{1}-3)^{2}-(x_{2}+3)^{2}, and c3(x)=1max(|x11|2,|x212|1.5)c_{3}(x)=1-\max\bigl{(}\tfrac{|x_{1}-1|}{2},\,\tfrac{|x_{2}-\frac{1}{2}|}{1.5}\bigr{)}.

The computed SA value function VγSA(x)V_{\gamma}^{\text{SA}}(x) and the 𝒮𝒜(𝒯,𝒞)\mathcal{S}\mathcal{A}(\mathcal{T},\mathcal{C}) set are shown in Fig. 1. The regions outside the enclosed magenta regions compose the 𝒮𝒜(𝒯,𝒞)\mathcal{S}\mathcal{A}(\mathcal{T},\mathcal{C}) set. The SA and RA trajectories are shown in Fig. 2, with controllers synthesized according to Sec. III-D, and more specifically Eq.(43).

Refer to caption
Figure 1: SA value function VγSA(x)V_{\gamma}^{\text{SA}}(x) and its level sets. The regions enclosed by the green and black solid lines are the target 𝒯\mathcal{T} and obstacles, respectively, so the region outside the black solid lines is the constraint set 𝒞\mathcal{C}. The regions enclosed by the magenta dashed lines indicate the zero superlevel set of VγSA(x)V_{\gamma}^{\text{SA}}(x), so the regions outside compose the 𝒮𝒜(𝒯,𝒞)\mathcal{S}\mathcal{A}(\mathcal{T},\mathcal{C}) set.
Refer to caption
Figure 2: SA (left) and RA (right) trajectories of the 3D Dubins car that starts at [4;4;0][-4;4;0] and aims to reach the green disk (target set 𝒯\mathcal{T}). Left: the car safely reaches M\mathcal{I_{\text{M}}} (blue dashed lines) using πRA\pi_{RA} from (40) and then stabilizes to m\mathcal{I_{\text{m}}} (the torus region enclosed by red dashed lines) using πH\pi_{H} from (43). Right: the car safely reaches 𝒯\mathcal{T} using πRA\pi_{RA} from (40), but does not remain in the target set. Instead, the trajectory will repeatedly leave and return.

VI CONCLUSIONS

In this article, we presented a HJ-based framework to solve both the infinite-horizon RA and SA games for general nonlinear continuous-time systems. We constructed a new discounted RA value function of which the zero sublevel set exactly characterizes the RA set. Moreover, we showed that the introduction of the discount factor leads to many desirable properties of the value function: the Lipschitz continuity under certain assumptions, the uniqueness of the viscosity solution to a corresponding HJI-VI, and the contraction property of the associated Bellman backup operator that guarantees convergence from arbitrary initializations. By integrating our RA strategy with R-CLVF, we developed a two-step framework to construct a SA value function, of which the zero sublevel set fully recovers the desired SA set. Finally, we provided controller synthesis approaches for both the RA and SA tasks.

ACKNOWLEDGMENT

We thank Professor Donggun Lee from North Carolina State University, Jingqi Li and Jason Choi from UC Berkeley, Haimin Hu from Princeton University, and Sander Tonkens, Will Sharpless, and Dylan Hirsch from UCSD for their insightful comments and valuable discussions.

-A Proof of the Uniqueness of the Viscosity Solution

Proof. Let V1,V2V^{1},V^{2} be the sub- and supersolutions of (3), respectively. Let x||x|| be the Euclidean norm of xnx\in\mathbb{R}^{n}. Consider Φ:2N\Phi:\mathbb{R}^{2N}\to\mathbb{R},

Φ(x,y):=V1(x)V2(y)xy22ε\displaystyle\Phi(x,y):=V^{1}(x)-V^{2}(y)-\frac{||x-y||^{2}}{2\varepsilon} (44)

where ε\varepsilon is a positive parameter to be chosen conveniently. Let us assume by contradiction that there is δ>0\delta>0 and x~\tilde{x} such that V1(x~)V2(x~)=δV^{1}(\tilde{x})-V^{2}(\tilde{x})=\delta. Then we have,

δ2<δ=Φ(x~,x~)supΦ(x,y).\displaystyle\frac{\delta}{2}<\delta=\Phi(\tilde{x},\tilde{x})\leq\sup\Phi(x,y). (45)

Since Φ\Phi is continuous and lim|x|+|y|Φ(x,y)=\lim_{|x|+|y|\to\infty}\Phi(x,y)=-\infty when xyx\neq y, there exist x¯,y¯\bar{x},\bar{y} such that

Φ(x¯,y¯)=supΦ(x,y).\displaystyle\Phi(\bar{x},\bar{y})=\sup\Phi(x,y). (46)

Thus, the inequality Φ(x¯,x¯)+Φ(y¯,y¯)2Φ(x¯,y¯)\Phi(\bar{x},\bar{x})+\Phi(\bar{y},\bar{y})\leq 2\Phi(\bar{x},\bar{y}) holds, so we easily get

x¯y¯2εV1(x¯)V1(y¯)+V2(x¯)V2(y¯).\displaystyle\frac{||\bar{x}-\bar{y}||^{2}}{\varepsilon}\leq V^{1}(\bar{x})-V^{1}(\bar{y})+V^{2}(\bar{x})-V^{2}(\bar{y}). (47)

Then the boundedness of V1V^{1} and V2V^{2} implies

x¯y¯kε.\displaystyle||\bar{x}-\bar{y}||\leq k\sqrt{\varepsilon}. (48)

for a suitable constant kk irrelevant of x,yx,y. By plugging (48) into (47) and using the uniform continuity of V1V^{1} and V2V^{2}, we get

x¯y¯2εω(ε),\displaystyle\frac{||\bar{x}-\bar{y}||^{2}}{\varepsilon}\leq\omega(\sqrt{\varepsilon}), (49)

for some modulus ω\omega, i.,e., a function ω:[0,+)[0,+)\omega:[0,+\infty)\to[0,+\infty) that is continuous, nondecreasing, and satisfies ω(0)=0\omega(0)=0. Next, define the C1C^{1} test functions

φ(x)\displaystyle\varphi(x) :=V2(y¯)+xy¯22ε,\displaystyle:=V^{2}(\bar{y})+\frac{||x-\bar{y}||^{2}}{2\varepsilon},
ϕ(y)\displaystyle\phi(y) :=V1(x¯)x¯y22ε,\displaystyle:=V^{1}(\bar{x})-\frac{||\bar{x}-y||^{2}}{2\varepsilon},

and observe that, by definition of x¯,y¯\bar{x},\bar{y}, V1φV^{1}-\varphi attains its maximum at x¯\bar{x} and V2ϕV^{2}-\phi attains its minimum at y¯\bar{y}. It is easy to compute

Dxφ(x¯)\displaystyle D_{x}\varphi(\bar{x}) =x¯y¯ε=Dyϕ(y¯)\displaystyle=\frac{\bar{x}-\bar{y}}{\varepsilon}=D_{y}\phi(\bar{y}) (50)

By definition of viscosity sub- and supersolution, we have

max{\displaystyle\max\bigl{\{} min{H(x¯,Dxφ(x¯))γV1(x¯),\displaystyle\min\{H\left(\bar{x},D_{x}\varphi(\bar{x})\right)-\gamma V^{1}(\bar{x}),
(x¯)V1(x¯)},c(x¯)V1(x¯)}0,\displaystyle\ell(\bar{x})-V^{1}(\bar{x})\},c(\bar{x})-V^{1}(\bar{x})\bigr{\}}\geq 0, (51)
max{\displaystyle\max\bigl{\{} min{H(y¯,Dyϕ(y¯))γV2(y¯),\displaystyle\min\{H\left(\bar{y},D_{y}\phi(\bar{y})\right)-\gamma V^{2}(\bar{y}),
(y¯)V2(y¯)},c(y¯)V2(y¯)}0,\displaystyle\ell(\bar{y})-V^{2}(\bar{y})\},c(\bar{y})-V^{2}(\bar{y})\bigr{\}}\leq 0, (52)

From (52) we have

c(y¯)V2(y¯)0,\displaystyle c(\bar{y})-V^{2}(\bar{y})\leq 0, (53)

and one of the following holds:

H(y¯,Dyϕ(y¯))γV2(y¯)0,\displaystyle H\left(\bar{y},D_{y}\phi(\bar{y})\right)-\gamma V^{2}(\bar{y})\leq 0, (54)
(y¯)V2(y¯)0,\displaystyle\ell(\bar{y})-V^{2}(\bar{y})\leq 0, (55)

From (51) we have

c(x¯)V1(x¯)0,\displaystyle c(\bar{x})-V^{1}(\bar{x})\geq 0, (56)

or both of the following hold:

H(x¯,Dxφ(x¯))γV1(x¯)0,\displaystyle H\left(\bar{x},D_{x}\varphi(\bar{x})\right)-\gamma V^{1}(\bar{x})\geq 0, (57)
(x¯)V1(x¯)0,\displaystyle\ell(\bar{x})-V^{1}(\bar{x})\geq 0, (58)

Let us first assume that (56) and (53) are true. After rearranging the terms and using (48), we get

V1(x¯)V2(y¯)\displaystyle V^{1}(\bar{x})-V^{2}(\bar{y}) c(x¯)c(y¯)Lcx¯y¯Lckε,\displaystyle\leq c(\bar{x})-c(\bar{y})\leq L_{c}||\bar{x}-\bar{y}||\leq L_{c}k\sqrt{\varepsilon}, (59)

which implies that Φ(x¯,y¯)Lckε\Phi(\bar{x},\bar{y})\leq L_{c}k\sqrt{\varepsilon}. Similarly, if (58) and (55) hold, we can show that

V1(x¯)V2(y¯)\displaystyle V^{1}(\bar{x})-V^{2}(\bar{y}) (x¯)(y¯)Lx¯y¯Lkε.\displaystyle\leq\ell(\bar{x})-\ell(\bar{y})\leq L_{\ell}||\bar{x}-\bar{y}||\leq L_{\ell}k\sqrt{\varepsilon}. (60)

Finally, assuming (54) and (57) hold, we get

V1(x¯)V2(y¯)\displaystyle V^{1}(\bar{x})-V^{2}(\bar{y}) 1γ(H(x¯,Dxφ(x¯))H(y¯,Dyϕ(y¯))).\displaystyle\leq\frac{1}{\gamma}\bigl{(}H\left(\bar{x},D_{x}\varphi(\bar{x})\right)-H\left(\bar{y},D_{y}\phi(\bar{y})\right)\bigr{)}. (61)

By the compactness of 𝒰\mathcal{U} and 𝒟\mathcal{D} and the Lipschitz continuity of ff in xx, the Hamiltonian HH has the property that for a fixed pp\in\mathbb{R}

|H(x,p)H(y,p)||p|Lfxy,|H(x,p)-H(y,p)|\leq|p|L_{f}||x-y||, (62)

for any x,yx,y\in\mathbb{R}. Plugging (62) back to (61) and invoking (50) and (49), we get

V1(x¯)V2(y¯)\displaystyle V^{1}(\bar{x})-V^{2}(\bar{y}) Lfγω(ε).\displaystyle\leq\frac{L_{f}}{\gamma}\omega(\sqrt{\varepsilon}). (63)

Combining (59), (60), (63) and (44), we obtain

Φ(x¯,y¯)13(Lckε+Lkε+Lfγω(ε)),\displaystyle\Phi(\bar{x},\bar{y})\leq\frac{1}{3}(L_{c}k\sqrt{\varepsilon}+L_{\ell}k\sqrt{\varepsilon}+\frac{L_{f}}{\gamma}\omega(\sqrt{\varepsilon})), (64)

and the right-hand side can be made smaller than δ2\frac{\delta}{2} for ε\varepsilon small enough, a contradiction to (45) and (46). To conclude, we have proven that a subsolution of (3) will not be larger than a supersolution. Since VγV_{\gamma} is both a sub- and supersolution, it is the unique one. \blacksquare

References

  • [1] K. Margellos and J. Lygeros, “Hamilton–jacobi formulation for reach–avoid differential games,” IEEE Transactions on Automatic Control, vol. 56, no. 8, pp. 1849–1861, 2011.
  • [2] A. Altarovici, O. Bokanowski, and H. Zidani, “A general hamilton-jacobi framework for non-linear state-constrained control problems,” ESAIM. Control, Optimisation and Calculus of Variations, vol. 19, no. 2, p. 337–357, Apr. 2013.
  • [3] J. F. Fisac, M. Chen, C. J. Tomlin, and S. S. Sastry, “Reach-avoid problems with time-varying dynamics, targets and constraints,” in Proceedings of the 18th International Conference on Hybrid Systems: Computation and Control.   New York, NY, USA: ACM, Apr. 2015. [Online]. Available: http://dx.doi.org/10.1145/2728606.2728612
  • [4] E. N. Barron, “Reach-avoid differential games with targets and obstacles depending on controls,” Dynamic Games and Applications, vol. 8, no. 4, pp. 696–712, 2018. [Online]. Available: https://doi.org/10.1007/s13235-017-0235-5
  • [5] I. Mitchell, A. Bayen, and C. Tomlin, “A time-dependent hamilton-jacobi formulation of reachable sets for continuous dynamic games,” IEEE Transactions on Automatic Control, vol. 50, no. 7, pp. 947–957, 2005.
  • [6] S. L. Herbert, S. Bansal, S. Ghosh, and C. J. Tomlin, “Reachability-based safety guarantees using efficient initializations,” in 2019 IEEE 58th Conference on Decision and Control (CDC), 2019, pp. 4810–4816.
  • [7] J. F. Fisac, N. F. Lugovoy, V. Rubies-Royo, S. Ghosh, and C. J. Tomlin, “Bridging hamilton-jacobi safety analysis and reinforcement learning,” in 2019 International Conference on Robotics and Automation (ICRA), 2019, pp. 8550–8556.
  • [8] K.-C. Hsu*, V. Rubies-Royo*, C. Tomlin, and J. Fisac, “Safety and liveness guarantees through reach-avoid reinforcement learning,” in Robotics: Science and Systems XVII.   Robotics: Science and Systems Foundation, Jul. 2021. [Online]. Available: http://dx.doi.org/10.15607/rss.2021.xvii.077
  • [9] K.-C. Hsu, A. Z. Ren, D. P. Nguyen, A. Majumdar, and J. F. Fisac, “Sim-to-lab-to-real: Safe reinforcement learning with shielding and generalization guarantees,” Artificial Intelligence, vol. 314, p. 103811, 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0004370222001515
  • [10] Z. Li, C. Hu, W. Zhao, and C. Liu, “Learning predictive safety filter via decomposition of robust invariant set,” 2023. [Online]. Available: https://arxiv.org/abs/2311.06769
  • [11] K.-C. Hsu, D. P. Nguyen, and J. F. Fisac, “Isaacs: Iterative soft adversarial actor-critic for safety,” in Proceedings of the 5th Annual Learning for Dynamics and Control Conference, ser. Proceedings of Machine Learning Research, N. Matni, M. Morari, and G. J. Pappas, Eds., vol. 211.   PMLR, 15–16 Jun 2023. [Online]. Available: https://proceedings.mlr.press/v211/hsu23a.html
  • [12] J. Wang, H. Hu, D. P. Nguyen, and J. F. Fisac, “Magics: Adversarial rl with minimax actors guided by implicit critic stackelberg for convergent neural synthesis of robot safety,” 2024. [Online]. Available: https://arxiv.org/abs/2409.13867
  • [13] D. P. Nguyen*, K.-C. Hsu*, W. Yu, J. Tan, and J. F. Fisac, “Gameplay filters: Robust zero-shot safety through adversarial imagination,” in 8th Annual Conference on Robot Learning, 2024. [Online]. Available: https://openreview.net/forum?id=Ke5xrnBFAR
  • [14] J. Li, D. Lee, J. Lee, K. S. Dong, S. Sojoudi, and C. Tomlin, “Certifiable reachability learning using a new lipschitz continuous value function,” IEEE Robotics and Automation Letters, vol. 9, no. 2, pp. 1–8, 2024. [Online]. Available: https://arxiv.org/pdf/2408.07866
  • [15] A. K. Akametalu, S. Ghosh, J. F. Fisac, V. Rubies-Royo, and C. J. Tomlin, “A minimum discounted reward hamilton–jacobi formulation for computing reachable sets,” IEEE Transactions on Automatic Control, vol. 69, no. 2, pp. 1097–1103, 2024.
  • [16] O. So and C. Fan, “Solving stabilize-avoid optimal control via epigraph form and deep reinforcement learning,” in Robotics: Science and Systems, Daegu, Republic of Korea, July 2023, pp. 10–14.
  • [17] G. Chenevert, J. Li, A. Kannan, S. Bae, and D. Lee, “Solving reach-avoid-stay problems using deep deterministic policy gradients,” Oct. 2024. [Online]. Available: http://arxiv.org/abs/2410.02898
  • [18] M. Z. Romdlony and B. Jayawardhana, “Stabilization with guaranteed safety using control lyapunov–barrier function,” Automatica, vol. 66, pp. 39–47, 2016. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0005109815005439
  • [19] Y. Meng, Y. Li, M. Fitzsimmons, and J. Liu, “Smooth converse lyapunov-barrier theorems for asymptotic stability with safety constraints and reach-avoid-stay specifications,” Automatica, vol. 144, p. 110478, 2022.
  • [20] C. Dawson, Z. Qin, S. Gao, and C. Fan, “Safe nonlinear control using robust neural lyapunov-barrier functions,” in Proceedings of the 5th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, A. Faust, D. Hsu, and G. Neumann, Eds., vol. 164.   PMLR, 08–11 Nov 2022, pp. 1724–1735. [Online]. Available: https://proceedings.mlr.press/v164/dawson22a.html
  • [21] Z. Gong and S. Herbert, “Robust control lyapunov-value functions for nonlinear disturbed systems,” 2024. [Online]. Available: https://arxiv.org/abs/2403.03455
  • [22] L. C. Evans, “Partial differential equations: Second edition,” in Partial Differential Equations: Second Edition, ser. Graduate Studies in Mathematics.   Providence, RI: American Mathematical Society, 2010, vol. 19.
  • [23] E. Barron and H. Ishii, “The bellman equation for minimizing the maximum cost,” Nonlinear Analysis: Theory, Methods & Applications, vol. 13, no. 9, pp. 1067–1090, 1989. [Online]. Available: https://www.sciencedirect.com/science/article/pii/0362546X89900965
  • [24] M. Bardi and I. Capuzzo-Dolcetta, Optimal control and viscosity solutions of Hamilton-Jacobi-bellman equations, 1st ed., ser. Modern Birkhäuser Classics.   Cambridge, MA: Birkhäuser, May 2009.
  • [25] J. J. Choi, D. Lee, B. Li, J. P. How, K. Sreenath, S. L. Herbert, and C. J. Tomlin, “A forward reachability perspective on robust control invariance and discount factors in reachability analysis,” arXiv preprint arXiv:2310.17180, 2023.
  • [26] I. M. Mitchell and J. A. Templeton, “A toolbox of hamilton-jacobi solvers for analysis of nondeterministic continuous and hybrid systems,” in Int. Work. on Hybrid Sys.: Computation and Control.   Springer, 2005.