Solving Reach- and Stabilize-Avoid Problems Using Discounted Reachability
Abstract
In this article, we consider the infinite-horizon reach‐avoid (RA) and stabilize‐avoid (SA) zero-sum game problems for general nonlinear continuous-time systems, where the goal is to find the set of states that can be controlled to reach or stabilize to a target set, without violating constraints even under the worst-case disturbance. Based on the Hamilton-Jacobi reachability method, we address the RA problem by designing a new Lipschitz continuous RA value function, whose zero sublevel set exactly characterizes the RA set. We establish that the associated Bellman backup operator is contractive and that the RA value function is the unique viscosity solution of a Hamilton–Jacobi variational inequality. Finally, we develop a two-step framework for the SA problem by integrating our RA strategies with a recently proposed Robust Control Lyapunov-Value Function, thereby ensuring both target reachability and long-term stability. We numerically verify our RA and SA frameworks on a 3D Dubins car system to demonstrate the efficacy of the proposed approach.
I INTRODUCTION
Reach-Avoid (RA) problems have gained significant attention in recent years due to their broad range of applications in engineering, especially in controlling systems where safety or strategic decision-making is critical. It is characterized by a two-player zero-sum game, where the control as one of the players aims to steer the system into a target set while avoiding an unsafe set, and the disturbance as another player strives to prevent the first from succeeding.
The finite-horizon RA game aims to identify the set of initial states (called the RA set) and the corresponding control law that can achieve such goals, i.e., reaching and avoiding, in a predefined finite time horizon. For continuous-time systems, Hamilton-Jacobi (HJ) reachability analysis is a widely used approach to solve such problems [1, 2, 3, 4]. This method allows for the design of a value function whose gradients can be used for control synthesis, and the sign of the value function at a given state characterizes the safety and performance of that state. The computation of this value function involves solving a time-dependent HJ partial differential equation [5], which typically relies on the value iteration that recursively applies a Bellman backup operator until convergence.
However, finite-horizon formulations have a major drawback. Since the value function inherently depends on time, the associated Bellman backup operator changes with the remaining time-to-go and is not a contraction mapping. As a result, the convergence of value iteration becomes sensitive to initialization, and correct initialization is required to ensure convergence to the correct solution. In contrast, a contraction mapping guarantees convergence from any arbitrary initialization and can even accelerate the computation when a good initialization is used, which are the theoretical pivots for methods like warm-starting [6] and reinforcement learning [7] that aim for efficient computation and scalability.
This limitation motivates the development of the infinite-horizon RA zero-sum game. For discrete-time systems, prior work leverages reinforcement learning to approximate RA value functions, where the convergence is guaranteed by designing a contractive Bellman backup operator that is induced from a discount factor [8, 9, 10, 11, 12, 13]. However, these works fail to provide deterministic safety guarantees or yield conservative RA sets in practice, although point-wise guarantees can be achieved during runtime by safety filtering for individual states [11, 10, 12, 13]. By designing a new discounted RA value function and a set-based certification method, [14] addresses these issues.
On the other hand, there is no equivalent work for continuous-time systems that considers all of these settings. For example, using HJ-based methods, the authors in [2] study both the finite and infinite-horizon RA problem, with disturbances only present in the finite case; neither yields a contraction mapping. Moreover, their method relies on augmenting the state space by one more dimension, which introduces additional computational complexity that grows exponentially with problem dimensionality. The authors in [15] devise a contractive Bellman operator but only consider the reach or avoid case. Inspired by these prior works, we propose a new HJ-based method to solve the infinite-horizon RA game that yields a contractive Bellman backup operator and does not need to augment the state space.
Although reach-avoid games provide a fundamental framework for ensuring a system reaches a target while avoiding unsafe states, they inherently focus on goal achievement rather than long-term safety; there are no guarantees on remaining in the goal (or maintaining safety) once the goal has been reached or after the prescribed time horizon. To ensure that the system remains in the goal set, the notion of reach-avoid-stay (RAS) has been extensively studied, and a popular family of methods tries to find a joint Control Lyapunov Barrier Function (CLBF), though each of them faces distinct challenges. For discrete-time systems, [16, 17] resort to reinforcement learning for scalability, but neither provides verification methods for the learned RAS sets, i.e., safety and stay are not guaranteed in practice.
For continuous-time systems, [18, 19] lays the theoretical foundation for the existence of a CLBF if a given set satisfies the RAS condition. However, it assumes the system to be affine in both control and disturbance and does not provide a constructive way to find a CLBF, which is a highly difficult task for high-dimensional or complex systems due to the lack of a universal construction method. [20] attempts to address this by leveraging the fitting power of neural networks to learn CLBFs, but the training process relies on the ability to sample from the control-invariant set. However, this becomes challenging when dealing with complex nonlinear dynamics, as the control-invariant set is not explicitly known. Note that for previous work on CLBFs, reach-avoid-stay is sometimes called interchangeably with stabilize-avoid (SA), as the Lyapunov property ensures stabilization of the system.
In this paper, we propose a new HJ-based method to solve both the infinite-horizon RA and SA games for general nonlinear continuous-time systems. Specifically, our main contributions are:
-
1.
We define a new RA value function and prove that the zero sublevel set of the RA value function is exactly the RA set.
-
2.
We establish the theoretical foundation to compute : the dynamic programming principle (DPP), being a unique viscosity solution to a Hamilton-Jacobi-Isaac’s Variational Inequality (HJI-VI), and the contraction property of the Bellman backup operator associated with the DPP of .
-
3.
We present a two-step framework to construct the SA value function, and show that the zero sublevel set of the SA value function fully recovers the desired SA set.
-
4.
We show how to synthesize the controllers for the RA and SA tasks.
To the best of our knowledge, both the proposed RA and SA frameworks are the first of their kind to solve the respective infinite-horizon games for general nonlinear continuous-time systems.
The rest of this article is organized as follows. In Section II, we define the RA and SA problems and introduce the concept of Robust Control Lyapunov-Value Functions (R-CLVF) for stabilization. In Section III, we introduce the value function to solve the RA game, and discuss its HJ characterizations. In Section IV, we integrate our RA formulation with R-CLVF and propose a two-step framework to solve the SA game.
II BACKGROUND
In this section, we first define the system of interest alongside the RA and SA problems.
II-A Dynamic System
We consider a general nonlinear time-invariant system
(1) |
The control signal and the disturbance signal are drawn from the set of Lebesgue measurable control signals: and the set of Lebesgue measurable disturbance signals: , where and are compact convex.
Assume the dynamics is uniformly continuous in , Lipschitz continuous in , and bounded . Under these assumptions, given the initial state , control and disturbance signal , there exists a unique solution of the system (1). When the initial condition, control, and disturbance signal used are not important, we use to denote the solution, which is also called the trajectory. Further assume the disturbance signal can be determined as a strategy with respect to the control signal: , drawn from the set of non-anticipative maps . With this non-anticipative assumption, the disturbance has an instantaneous advantage over the control signal.
II-B Reach-Avoid and Stabilize-Avoid Problems
Let the open sets and denote the target set and constraint set, which are characterized by some Lipschitz continuous, bounded cost function and constraint function , respectively. In this paper, we solve the following problems:
Problem 1: Specify a set of states that can be controlled to the target set safely in finite time (i.e., exists some finite s.t. ), under the worst-case disturbance. We call this set the RA set:
s.t., |
Note that though an individual state has to reach the target while satisfying constraints in a finite time, we do not require the existence of a prescribed finite that works for all states in the RA set.
Problem 2: Specify a set of states that can be stabilized safely to a control invariant set in , under the worst-case disturbance. We call this set the SA set:
s.t., | |||
where is the smallest robustly control invariant set defined below, after we introduce the R-CLVF. The finite-horizon RA problem can be solved by HJ reachability if a finite time horizon is specified, while the infinite-horizon RA and SA problems remain open.
To solve the SA problem, we will apply the R-CLVF [21]:
Definition 1.
R-CLVF of (1) is
Here, is the domain, is a user-specified parameter that represents the desired decay rate, , where is the desired point that we want to stabilize to, and is a parameter that depends on the system dynamics and will be explained later.
When and , the value represents the largest deviation from along the trajectory starting from . The level set corresponding to the smallest value is the smallest robustly control invariant set (SRCIS) of , defined in [21].
When and (i.e., ), the R-CLVF value captures the largest exponentially amplified deviation of a trajectory starting from to the , under worst-case disturbance. If this value is finite, it means can be exponentially stabilized to (Lem. 7 of [21]).
Theorem 1.
The relative state can be exponentially stabilized to the from , if the R-CLVF exists in , i.e., ,
(2) |
For conciseness, we simplify and to and , as is a hyperparameter. The R-CLVF can be computed by solving the following R-CLVF-VI until convergence
The R-CLVF optimal controller is
(3) |
The benefit of R-CLVF is two-fold: 1) for each positive level set, it is guaranteed that
and therefore each positive level set is attractive. 2) For any , the zero sublevel set of the R-CLVF is the largest robustly control invariant subset of the zero sub-level set of .
III A NEW DISCOUNTED RA VALUE FUNCTION
In this section, we propose a discounted RA value function to solve the RA and SA problems, i.e., characterizing and . The introduction of the discount factor is necessary to ensure the value function is the unique viscosity solution of the corresponding HJI-VI, and many other desirable properties, such as the Lipschitz continuity and the contraction of the associated Bellman backup operator, also arise from the discount factor.
III-A Definition and Properties
Definition 2.
A time-discounted RA value function is defined as
(4) |
where solves (1) with initial state and . Intuitively, the control seeks to decrease the value along the trajectory, while the disturbance seeks to increase it. The infimum over time captures whether the trajectory ever reaches the target set and the maximum checks if it ever violates the safety constraints. That is, implies that there exists a control signal that can steer the system from initial state to the target without violating the constraints, even under worst-case disturbance. The following theorem proves our claim.
Proposition 1.
(Exact Recovery of RA set)
(5) |
Proof. We first prove the sufficiency. Suppose . That is, given arbitrary such that and . Since , we have and . Therefore,
Because is arbitrary, .
For the necessity, let be such that , that is, for any such that,
Thus, there also exists such that
Since , both and are negative, which means .
As mentioned before, we now show how the discount factor guarantees the boundedness and Lipschitz continuity of (2).
Proposition 2.
(Boundedness and Lipschitz Continuity) is bounded and Lipschitz continuous in if , where is the Lipschitz constant of .
Proof. Define
Take and any . Then, there exists such that for any ,
Similarly, for any , there exists such that
Combining the above two inequalities, we have
(6) |
Moreover, there exists such that
Plugging this back to (2), we have
(7) |
By definition of , both of the following hold:
(8) | ||||
(9) |
for any . Similarly, either of the following holds:
(10) | ||||
(11) |
for some . Finally, plugging (8) and (10) back to (2), we have
(12) |
where are the Lipschitz constants of and , respectively. The second inequality is a result of Gronwall’s inequality, and the third is due to the assumption that . Similarly, we can show that . Thus, we have . The proof for plugging (9) and (11) to (2) is similar.
The boundedness of follows from that of and , and the fact that .
Note that since the value function is Lipschitz continuous, it is differentiable almost everywhere by Rademacher’s Theorem [22, Ch.5.8.3].
III-B Hamilton-Jacobi Characertization of
We now establish the theoretical foundations to compute : the DPP as a result of Bellman’s optimality principle, being a unique viscosity solution to a HJI-VI, and the contraction property of the Bellman backup operator associated with the DPP of . These provide two approaches for the numerical computation. The first is to follow the procedure in [3], i.e., solving the HJI-VI and combining the DPP until convergence. The other is to do a value iteration based on the Bellman backup operator. In this paper, all numerical solutions are obtained following the first approach.
We first show the DPP, as it is the basis for proving the viscosity solution and deriving the Bellman backup operator.
Theorem 2.
(Dynamic Programming principle). Suppose . For ,
(13) |
Proof. By definition of ,
(14) |
Thus, it suffices to prove that the optimized last term (for both control and disturbance) in the minimum of (2) is identical to that of (2). Due to the time-invariance of the system (1), we can transform
Taking and , we get
where the infimum over for is dropped because it is independent of . Plugging the above into (2) and noticing that the optimization over the two intervals and is independent, we have
(15) |
which is identical to the expression in (2).
Building on Theorem 2, Theorem 3 shows that (2) is the unique viscosity solution to the HJI-VI defined below. We follow the definition of the viscosity solution provided in [23].
Theorem 3.
(Unique Viscosity Solution to HJI-VI). Assume satisfies (1), and that are bounded and Lipschitz continuous. Then the value function defined in (2) is the unique viscosity solution of the following HJI-VI
(16) |
A continuous function is a viscosity solution of a partial differential equation if it is both a subsolution and a supersolution (defined below). We will first prove that is a viscosity subsolution of (3). Let be such that attains a local maximum at ; without loss of generality, assume that this maximum is 0 . We say that is a subsolution of (3) if, for any such ,
(17) |
where the Hamiltonian is defined as
Assuming (3) is false, then the following must hold:
(18) |
Moreover, at least one of the following must hold:
(19a) | |||
(19b) |
for some . Suppose (18) and (19a) are true. We abbreviate to whenever the statement holds for any . By continuity of and system trajectories, there exists small enough , such that for all ,
(20a) | |||
(20b) |
Incorporating (20) into the dynamic programming principle (2), we have
which is a contradiction, since . Now suppose (18) and (19b) are true. By definition of , for any
Then, for small enough , (20a) holds and there exists such that
for all . Following the technique used in Theorem 1.10 in Chapter VIII in [24] with some modifications, we multiply both sides of the above inequality by and integrate from to to get
Recalling that attains a local maximum of at , we have
(21) |
Incorporating (20a) and (21) into the dynamic programming principle (2), we have
which is a contradiction, since . Therefore, we conclude that (3) must be true and hence is indeed a subsolution of (3).
We now proceed to prove that is a viscosity supersolution of (3). Let be such that attains a local minimum at ; without loss of generality, assume that this minimum is 0. We say that is a supersolution of (3) if, for any such ,
(22) |
Assume (3) is false, then either it must hold that
(23) |
or both of the following are true:
(24a) | |||
(24b) |
for some . Suppose (23) is true. Then, there exists small enough , such that for all ,
(25) |
Incorporating (25) into the dynamic programming principle (2), we get
, |
which is a contradiction, as . Now, suppose (24) holds. Then, there exists small enough , such that for all ,
(26) |
and there exists such that
for all . Then, there exists small enough such that
for all . Similarly, we multiply both sides of the above inequality by and integrate from to to get
Recalling that attains a local minimum of at , we have
(27) |
Incorporating (26) and (27) into the dynamic programming principle (2), we have
which is a contradiction, since . Therefore, we conclude that (3) must be true and hence is indeed a supersolution of (3).
In fact, the uniqueness property can be seen as a result of the contraction property of the Bellman backup associated with the dynamic programming principle of in (2). We define a Bellman backup operator : for , (where represents the set of bounded and uniformly continuous functions: ,) as
(28) |
With the help of the discount factor, we can show that this operator is a contraction mapping.
Theorem 4.
(Contraction mapping). For any ,
(29) |
and the RA value function in (2) is the unique fixed-point solution to for each . Also, for any ,
(30) |
Proof. Define
for . Then,
Without loss of generality, let . For any , there exists , such that
for any , which indicates that both of the following hold:
(31) | ||||
(32) |
On the other hand, for any and any , there exists s.t.
which indicates that either of the following holds:
(33) | ||||
(34) |
Combining Eq.(31) and (33), we have
The second inequality holds since, for all . Moreover, combining Eq.(32) and (34), we have , so the result follows. As the above inequality holds for all and , we have
III-C Computation of
Theorem 4 allows the use of various methods to compute the value function using the operation , which does not require a specific initialization. For instance, a popular method is to solve the finite-horizon HJ equation presented in the following proposition. The proof is analogous to the proof of Lemma 5 of [25].
Proposition 3.
(Finite horizon HJI-VI for the computation of ). For a given initial value function candidate , and let : be the unique viscosity solution to the following terminal-value HJI-VI
(35) | |||
(36) |
for .
Then we have: .
In Prop. 3, any works for the computation of ; for instance, a straightforward choice of can be . As , vanishes to 0 for all .
Prop. 3 emphasizes that given any initialization , the results from the contraction mapping (Theorem 4) and from solving the terminal value HJI-VI are the same, if the terminal value of the HJI-VI satisfies (35).
Combining Theorem 4 and Prop. 3, we have
(37) |
The PDE (36) can be numerically solved backward in time from the terminal condition (35), by using well-established time-dependent level-set methods [26].
In addition, another line of methods enabled by Theorem 4 is based on time discretization, such as value iteration, to accurately solve for . The subsequent corollary of Theorem 4 establishes that value iteration, initialized with any function , will converge to with a Q-linear convergence rate, as specified in (38). For a given time step , the semi-Lagrangian approach can be utilized to approximate the exact Bellman operator in (29), leading to a numerical approximation. Furthermore, as , the resulting value function converges to [15].
Corollary 1.
(Value Iteration). Let and consider a time step . Define the sequence iteratively as . Then, the following holds:
(38) |
which implies that .
Proof. This result follows directly from Theorem 4.
III-D The Finite-horizon Value Function and Control Synthesis
In many HJ-reachability based works, the optimal controller can be synthesized by the gradient of the value function (see Sec I.5 in [24]). However, it should be pointed out that the controller synthesized directly from the gradient of (2) does not guarantee finite-time reach-avoid of the target and constraint sets. Though from Prop 1, we concluded that implies there exists some s.t. and for all , it does not mean the optimal control determined from the gradient of is able to drive the system to the target at exactly time . One reason is that HJI-VI (3) does not provide any useful information ( has nothing to do with finite-time reach avoid), and that the system can travel freely before , and then apply some control and reach the target.
In this section we discuss how to design a time-optimal RA controller using a finite-horizon version of the infinite RA value function discussed above.
Definition 3.
A finite-horizon RA value function is defined as
(39) |
The zero sublevel set of characterizes a finite-horizon RA set, i.e., it is the set of initial states that can reach the target at some , and avoid the obstacle in . We state without proof that (3) satisfies the corresponding DPP and is the unique viscosity solution to the HJI-VI (35) (36), by taking .
From the definition, for any fixed time horizon , . This means the zero sub-level set of provides an under-approximation of the infinite-time RA set. If we take , the finite-time RA value function at is exactly the same as (2), i.e., . Further, for a fixed state and time horizon , as decreases from to 0, is non-increasing, and the time (if exists) first decay to is the minimal time for the trajectory starting from to reach the target while avoiding the obstacle. One optimal control signal along a trajectory is given by
(40) |
where . Notice here and is an optimal control-trajectory pair [24].
IV TWO-STEP FORMULATION FOR STABILIZE-AVOID PROBLEM
In this section, we propose a two-step method to solve the SA problem, by combining the RA formulation (2) and the R-CLVF. We further assume is upper bounded by , i.e. , and the zero sub-level set of contains some robust control invariant subset . With this assumption, we guarantee that there exists at least one sub-level set of the R-CLVF to be a strict subset of the target set.
The idea is straightforward: we treat one level set of the R-CLVF as the new target set. Ideally, this set should be the largest sub-level set of the contained in the target . To fit in our RA framework, we define the shifted R-CLVF , where is the level of , and we have . Now we construct a new value function
(41) |
where the cost function in 2 is replaced by . Next we show that its zero sublevel set is the desired SA set.
Proposition 4.
(Exact Recovery of SA set)
(42) |
Proof. For sufficiency, note that implies there exists such that for any , it safely steers the system to and ultimately converge to . Since the trajectory is continuous, it must enter , at some time , i.e., and for all . Since we take the infimum over , and at , the value is already negative, we conclude that .
Now suppose . Then, there exist some , for all , there exists s.t
Since exponential is positive, this means and for all . Further, since , for all , there exists s.t. . This means for all , there exists that steers the systems to and then safely stablize the system to .
As mentioned before, the SA problem is solved in a two-step manner: we first reach one level set of the R-CLVF (), then use the R-CLVF to stabilize the system. Therefore, one feedback controller can also be synthesized in the two-step formulation: given any initial states in the SA set, use (40) in the reach-avoid phase, and (3) in the stabilize-avoid phase:
(43) |
V NUMERICAL EXAMPLES
We demonstrate that our stabilize- and reach-avoid frameworks can solve the respective problems; find and sets, stabilize or steer the system to or . Consider the following 3D Dubins car example
where , , and . The target set , where , and the constraint set , where , , and .
The computed SA value function and the set are shown in Fig. 1. The regions outside the enclosed magenta regions compose the set. The SA and RA trajectories are shown in Fig. 2, with controllers synthesized according to Sec. III-D, and more specifically Eq.(43).


VI CONCLUSIONS
In this article, we presented a HJ-based framework to solve both the infinite-horizon RA and SA games for general nonlinear continuous-time systems. We constructed a new discounted RA value function of which the zero sublevel set exactly characterizes the RA set. Moreover, we showed that the introduction of the discount factor leads to many desirable properties of the value function: the Lipschitz continuity under certain assumptions, the uniqueness of the viscosity solution to a corresponding HJI-VI, and the contraction property of the associated Bellman backup operator that guarantees convergence from arbitrary initializations. By integrating our RA strategy with R-CLVF, we developed a two-step framework to construct a SA value function, of which the zero sublevel set fully recovers the desired SA set. Finally, we provided controller synthesis approaches for both the RA and SA tasks.
ACKNOWLEDGMENT
We thank Professor Donggun Lee from North Carolina State University, Jingqi Li and Jason Choi from UC Berkeley, Haimin Hu from Princeton University, and Sander Tonkens, Will Sharpless, and Dylan Hirsch from UCSD for their insightful comments and valuable discussions.
-A Proof of the Uniqueness of the Viscosity Solution
Proof. Let be the sub- and supersolutions of (3), respectively. Let be the Euclidean norm of . Consider ,
(44) |
where is a positive parameter to be chosen conveniently. Let us assume by contradiction that there is and such that . Then we have,
(45) |
Since is continuous and when , there exist such that
(46) |
Thus, the inequality holds, so we easily get
(47) |
Then the boundedness of and implies
(48) |
for a suitable constant irrelevant of . By plugging (48) into (47) and using the uniform continuity of and , we get
(49) |
for some modulus , i.,e., a function that is continuous, nondecreasing, and satisfies . Next, define the test functions
and observe that, by definition of , attains its maximum at and attains its minimum at . It is easy to compute
(50) |
By definition of viscosity sub- and supersolution, we have
(51) | ||||
(52) |
From (52) we have
(53) |
and one of the following holds:
(54) | |||
(55) |
From (51) we have
(56) |
or both of the following hold:
(57) | |||
(58) |
Let us first assume that (56) and (53) are true. After rearranging the terms and using (48), we get
(59) |
which implies that . Similarly, if (58) and (55) hold, we can show that
(60) |
Finally, assuming (54) and (57) hold, we get
(61) |
By the compactness of and and the Lipschitz continuity of in , the Hamiltonian has the property that for a fixed
(62) |
for any . Plugging (62) back to (61) and invoking (50) and (49), we get
(63) |
Combining (59), (60), (63) and (44), we obtain
(64) |
and the right-hand side can be made smaller than for small enough, a contradiction to (45) and (46). To conclude, we have proven that a subsolution of (3) will not be larger than a supersolution. Since is both a sub- and supersolution, it is the unique one.
References
- [1] K. Margellos and J. Lygeros, “Hamilton–jacobi formulation for reach–avoid differential games,” IEEE Transactions on Automatic Control, vol. 56, no. 8, pp. 1849–1861, 2011.
- [2] A. Altarovici, O. Bokanowski, and H. Zidani, “A general hamilton-jacobi framework for non-linear state-constrained control problems,” ESAIM. Control, Optimisation and Calculus of Variations, vol. 19, no. 2, p. 337–357, Apr. 2013.
- [3] J. F. Fisac, M. Chen, C. J. Tomlin, and S. S. Sastry, “Reach-avoid problems with time-varying dynamics, targets and constraints,” in Proceedings of the 18th International Conference on Hybrid Systems: Computation and Control. New York, NY, USA: ACM, Apr. 2015. [Online]. Available: http://dx.doi.org/10.1145/2728606.2728612
- [4] E. N. Barron, “Reach-avoid differential games with targets and obstacles depending on controls,” Dynamic Games and Applications, vol. 8, no. 4, pp. 696–712, 2018. [Online]. Available: https://doi.org/10.1007/s13235-017-0235-5
- [5] I. Mitchell, A. Bayen, and C. Tomlin, “A time-dependent hamilton-jacobi formulation of reachable sets for continuous dynamic games,” IEEE Transactions on Automatic Control, vol. 50, no. 7, pp. 947–957, 2005.
- [6] S. L. Herbert, S. Bansal, S. Ghosh, and C. J. Tomlin, “Reachability-based safety guarantees using efficient initializations,” in 2019 IEEE 58th Conference on Decision and Control (CDC), 2019, pp. 4810–4816.
- [7] J. F. Fisac, N. F. Lugovoy, V. Rubies-Royo, S. Ghosh, and C. J. Tomlin, “Bridging hamilton-jacobi safety analysis and reinforcement learning,” in 2019 International Conference on Robotics and Automation (ICRA), 2019, pp. 8550–8556.
- [8] K.-C. Hsu*, V. Rubies-Royo*, C. Tomlin, and J. Fisac, “Safety and liveness guarantees through reach-avoid reinforcement learning,” in Robotics: Science and Systems XVII. Robotics: Science and Systems Foundation, Jul. 2021. [Online]. Available: http://dx.doi.org/10.15607/rss.2021.xvii.077
- [9] K.-C. Hsu, A. Z. Ren, D. P. Nguyen, A. Majumdar, and J. F. Fisac, “Sim-to-lab-to-real: Safe reinforcement learning with shielding and generalization guarantees,” Artificial Intelligence, vol. 314, p. 103811, 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0004370222001515
- [10] Z. Li, C. Hu, W. Zhao, and C. Liu, “Learning predictive safety filter via decomposition of robust invariant set,” 2023. [Online]. Available: https://arxiv.org/abs/2311.06769
- [11] K.-C. Hsu, D. P. Nguyen, and J. F. Fisac, “Isaacs: Iterative soft adversarial actor-critic for safety,” in Proceedings of the 5th Annual Learning for Dynamics and Control Conference, ser. Proceedings of Machine Learning Research, N. Matni, M. Morari, and G. J. Pappas, Eds., vol. 211. PMLR, 15–16 Jun 2023. [Online]. Available: https://proceedings.mlr.press/v211/hsu23a.html
- [12] J. Wang, H. Hu, D. P. Nguyen, and J. F. Fisac, “Magics: Adversarial rl with minimax actors guided by implicit critic stackelberg for convergent neural synthesis of robot safety,” 2024. [Online]. Available: https://arxiv.org/abs/2409.13867
- [13] D. P. Nguyen*, K.-C. Hsu*, W. Yu, J. Tan, and J. F. Fisac, “Gameplay filters: Robust zero-shot safety through adversarial imagination,” in 8th Annual Conference on Robot Learning, 2024. [Online]. Available: https://openreview.net/forum?id=Ke5xrnBFAR
- [14] J. Li, D. Lee, J. Lee, K. S. Dong, S. Sojoudi, and C. Tomlin, “Certifiable reachability learning using a new lipschitz continuous value function,” IEEE Robotics and Automation Letters, vol. 9, no. 2, pp. 1–8, 2024. [Online]. Available: https://arxiv.org/pdf/2408.07866
- [15] A. K. Akametalu, S. Ghosh, J. F. Fisac, V. Rubies-Royo, and C. J. Tomlin, “A minimum discounted reward hamilton–jacobi formulation for computing reachable sets,” IEEE Transactions on Automatic Control, vol. 69, no. 2, pp. 1097–1103, 2024.
- [16] O. So and C. Fan, “Solving stabilize-avoid optimal control via epigraph form and deep reinforcement learning,” in Robotics: Science and Systems, Daegu, Republic of Korea, July 2023, pp. 10–14.
- [17] G. Chenevert, J. Li, A. Kannan, S. Bae, and D. Lee, “Solving reach-avoid-stay problems using deep deterministic policy gradients,” Oct. 2024. [Online]. Available: http://arxiv.org/abs/2410.02898
- [18] M. Z. Romdlony and B. Jayawardhana, “Stabilization with guaranteed safety using control lyapunov–barrier function,” Automatica, vol. 66, pp. 39–47, 2016. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0005109815005439
- [19] Y. Meng, Y. Li, M. Fitzsimmons, and J. Liu, “Smooth converse lyapunov-barrier theorems for asymptotic stability with safety constraints and reach-avoid-stay specifications,” Automatica, vol. 144, p. 110478, 2022.
- [20] C. Dawson, Z. Qin, S. Gao, and C. Fan, “Safe nonlinear control using robust neural lyapunov-barrier functions,” in Proceedings of the 5th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, A. Faust, D. Hsu, and G. Neumann, Eds., vol. 164. PMLR, 08–11 Nov 2022, pp. 1724–1735. [Online]. Available: https://proceedings.mlr.press/v164/dawson22a.html
- [21] Z. Gong and S. Herbert, “Robust control lyapunov-value functions for nonlinear disturbed systems,” 2024. [Online]. Available: https://arxiv.org/abs/2403.03455
- [22] L. C. Evans, “Partial differential equations: Second edition,” in Partial Differential Equations: Second Edition, ser. Graduate Studies in Mathematics. Providence, RI: American Mathematical Society, 2010, vol. 19.
- [23] E. Barron and H. Ishii, “The bellman equation for minimizing the maximum cost,” Nonlinear Analysis: Theory, Methods & Applications, vol. 13, no. 9, pp. 1067–1090, 1989. [Online]. Available: https://www.sciencedirect.com/science/article/pii/0362546X89900965
- [24] M. Bardi and I. Capuzzo-Dolcetta, Optimal control and viscosity solutions of Hamilton-Jacobi-bellman equations, 1st ed., ser. Modern Birkhäuser Classics. Cambridge, MA: Birkhäuser, May 2009.
- [25] J. J. Choi, D. Lee, B. Li, J. P. How, K. Sreenath, S. L. Herbert, and C. J. Tomlin, “A forward reachability perspective on robust control invariance and discount factors in reachability analysis,” arXiv preprint arXiv:2310.17180, 2023.
- [26] I. M. Mitchell and J. A. Templeton, “A toolbox of hamilton-jacobi solvers for analysis of nondeterministic continuous and hybrid systems,” in Int. Work. on Hybrid Sys.: Computation and Control. Springer, 2005.