This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Learning Performance-oriented Control Barrier Functions Under Complex Safety Constraints and Limited Actuation

Lakshmideepakreddy Manda
Department of Electrical and Computer Engineering
Johns Hopkins University
United States
lmanda1@jhu.edu
&Shaoru Chen
Microsoft Research
New York, United States
shaoruchen@microsoft.com
&Mahyar Fazlyab
Department of Electrical and Computer Engineering
Johns Hopkins University, United States
mahyarfazlyab@jhu.edu
Abstract

Control Barrier Functions (CBFs) provide an elegant framework for constraining nonlinear control system dynamics to remain within an invariant subset of a designated safe set. However, identifying a CBF that balances performance—by maximizing the control invariant set—and accommodates complex safety constraints, especially in systems with high relative degree and actuation limits, poses a significant challenge. In this work, we introduce a novel self-supervised learning framework to comprehensively address these challenges. Our method begins with a Boolean composition of multiple state constraints that define the safe set. We first construct a smooth function whose zero superlevel set forms an inner approximation of this safe set. This function is then combined with a smooth neural network to parameterize the CBF candidate. To train the CBF and maximize the volume of the resulting control invariant set, we design a physics-informed loss function based on a Hamilton-Jacobi Partial Differential Equation (PDE). We validate the efficacy of our approach on a 2D double integrator (DI) system and a 7D fixed-wing aircraft system (F16).

1 Introduction

Refer to caption
Figure 1: Illustration of the learned CBF-QP filtering many initializations of PID reference control on the DI system. The CBF zero contour drawn on its value heatmap bounds the learned control invariant set.

CBFs are a powerful tool to enforce safety constraints for nonlinear control systems [1], with many successful applications in autonomous driving [2], UAV navigation [3], robot locomotion [4], and safe reinforcement learning [5]. For control-affine nonlinear systems, CBFs can be used to construct a convex quadratic programming (QP)-based safety filter deployed online to safeguard against potentially unsafe control commands. The induced safety filter, denoted as CBF-QP, corrects the reference controller to remain in a safe control invariant set.

While Control Barrier Functions (CBFs) provide an efficient method to ensure safety, finding such functions can be challenging. Specifically, there is no guarantee that the resulting safety filter will remain feasible throughout operation. This is primarily because the “model-free” construction of the filter only incorporates the constraints. Complex constraints, high relative degree, and bounded actuation exacerbate the challenge of ensuring feasibility. Various techniques have been proposed to address these challenges such as CBF composition for complex constraints [6, 7, 8, 9], higher-order CBFs for high relative degree [10] [11], and integral CBFs [12] for limited actuation. Despite significant progress, these approaches can make the filter overly restrictive, thus limiting performance.

Contributions

We propose a novel self-supervised learning framework for CBF synthesis that systematically addresses all the above challenges. First, we handle complex safety constraints and high relative degree in CBF synthesis by encoding the safety constraints into the CBF parameterization with minimal conservatism. Second, we design a physics-informed training loss function based on Hamilton-Jacobi (HJ) reachability analysis [13] to satisfy bounded actuation while maximizing the learned control invariant set volume. We evaluate our method on the double-integrator and the high-dimensional fixed-wing aircraft system and demonstrate that the proposed method effectively learns a performant CBF even with complex safety constraints. We call the proposed framework Physics-informed Neural Network (PINN)-CBF [14].

1.1 Related work

Complex safety constraints

For a safe set described by Boolean logical operations on multiple constraints, [6] composes multiple CBFs accordingly through the non-smooth min/max operators.[7] introduces smooth composition of logical operations on constraints, which was later extended to simultaneously handle actuation and state constraints [8][9] using integral CBFs [12]. [15] proposes an algorithmic way to create a single smooth CBF arbitrarily composing both min and max operators. Such smooth bounds have been used in changing environments [16]. Notably, [17] ensures the input constrained feasibility of the CBF condition while composing multiple CBFs.

High-order CBF

High-order CBF (HOCBF) [10] and exponential CBF [11] are systematicly approach CBF construction when the safety constraints have high relative degree. However, controlling the conservatism of these approaches is a challenge. To reduce conservatism or improve the performance of HOCBF, various learning frameworks have been proposed [18, 19, 20] that allow tuning of the class 𝒦\mathcal{K} functions used in the CBF condition.

Learning CBF with input constraints

Motivated by the difficulty of hand-designing CBFs, learning-based approaches building on the past [21] have emerged in recent years as an alternative [22, 23, 24, 25]. Liu et al. [26] explicitly consider input constraints in learning a CBF by finding counterexamples on the 0-level set of the CBF for training, while Dai et al. [27] propose a data-efficient prioritized sampling method. [28] explores adaptive sampling for training HJB NN models favoring sharp gradient regions. Drawing tools from reachability analysis, the recent work [29] iteratively expands the volume of the control invariant set by learning the policy value function and improving the performance through policy iteration. Similarly, Dai et al. [30] expand conservative hand-crafted CBFs by learning on unsafe and safe trajectories, and Qin et al. [31] applies actor-critic RL framework to learn a CBF value function while [32] learns a control policy and CBF together for black-box constraints and dynamics. Our method differs by being controller-independent and basing its learning objective on the HJ partial differential equation (PDE), which defines the maximal control invariant set, without requiring trajectory training data.

HJ reachability-based methods

The value functions in HJ reachability analysis have been extended to construct control barrier-value functions (CBVF) [33] and control Lyapunov-value functions [34], which can be computed using existing toolboxes [35]. Tokens et al. [36] further apply such tools to refine existing CBF candidates, and Bansal et al. [37] learn neural networks solution to a HJ PDE for reachability analysis. Of particular interest to our work is the CBVF, which is close to the CBF formulation and provides a characterization of the viability kernel. Our work aims to learn a neural network CBF from data without using computational tools based on spatial discretization.

Notation

An extended class 𝒦\mathcal{K} function is a function α:(b,a)\alpha:(-b,a)\mapsto\mathbb{R} for some a,b>0a,b>0 that is strictly increasing and satisfies α(0)=0\alpha(0)=0. We denote Lr+(h)={xnh(x)r}L_{r}^{+}(h)=\{x\in\mathbb{R}^{n}\mid h(x)\geq r\} as the rr-superlevel set of a continuous function h(x)h(x). The positive and negative parts of a number aa\in\mathbb{R} are denoted by (a)+=max(a,0)(a)_{+}=\max(a,0) and (a)=min(a,0)(a)_{-}=\min(a,0), respectively. Let ,,¬\land,\lor,\neg denote the logical operations of conjunction, disjunction, and negation, respectively. For two statements AA and BB, we have ¬(AB)=(¬A)(¬B)\neg(A\land B)=(\neg A)\lor(\neg B) and ¬(AB)=(¬A)(¬B)\neg(A\lor B)=(\neg A)\land(\neg B).

2 Background and Problem Statement

Consider a continuous time control-affine system:

x˙=f(x)+g(x)u,u𝒰,\displaystyle\dot{x}=f(x)+g(x)u,\quad u\in\mathcal{U}, (1)

where 𝒟n\mathcal{D}\subseteq\mathbb{R}^{n} is the domain of the system, x𝒟x\in\mathcal{D} is the state, u𝒰mu\in\mathcal{U}\subset\mathbb{R}^{m} is the control input, 𝒰m\mathcal{U}\subseteq\mathbb{R}^{m} denotes the control input constraint. We assume that f:𝒟nf\colon\mathcal{D}\to\mathbb{R}^{n}, g:𝒟n×mg\colon\mathcal{D}\to\mathbb{R}^{n\times m} are locally Lipschitz continuous and 𝒰\mathcal{U} is a convex polyhedron. We denote the solution of (1) at time t0t\geq 0 by x(t)x(t). Given a set 𝒳𝒟\mathcal{X}\subseteq\mathcal{D} that represents a safe subset of the state space, the general objective of safe control design is to find a control law π(x)\pi(x) that renders 𝒳\mathcal{X} invariant under the closed-loop dynamics x˙=f(x)+g(x)π(x)\dot{x}=f(x)+g(x)\pi(x), i.e., if x(t0)𝒳x(t_{0})\in\mathcal{X} for some t00t_{0}\geq 0, then x(t)𝒳x(t)\in\mathcal{X} for all tt0t\geq t_{0}. A general approach to solving this problem is through control barrier functions.

2.1 Control Barrier Functions

Suppose the safe set is defined by the 0-superlevel set of a smooth function c()c(\cdot) such that 𝒳={xc(x)0}\mathcal{X}=\{x\mid c(x)\geq 0\}. For 𝒳\mathcal{X} to be control invariant, the boundary function c()c(\cdot) must satisfy maxu𝒰c˙(x,u)0\max_{u\in\mathcal{U}}\dot{c}(x,u)\geq 0 when c(x)=0c(x)=0 by Nugomo’s theorem [38]. However, since c()c(\cdot) does not necessarily satisfy the condition, we settle with finding a control invariant set contained in 𝒳\mathcal{X} through CBFs.

Definition 1 (Control barrier function).

Let 𝒮:=L0+(h)𝒳𝒟\mathcal{S}:=L_{0}^{+}(h)\subseteq\mathcal{X}\subseteq\mathcal{D} be the 0-superlevel set of a continuously differentiable function h:𝒟h\colon\mathcal{D}\to\mathbb{R}. Then h()h(\cdot) is a control barrier function for system (1) if there exists an extended class 𝒦\mathcal{K} function α\alpha such that

supu𝒰{Lfh(x)+Lgh(x)u+α(h(x))}0,x𝒟,\sup_{u\in\mathcal{U}}\{L_{f}h(x)+L_{g}h(x)u+\alpha(h(x))\}\geq 0,\forall x\in\mathcal{D}, (2)

where Lfh(x)=h(x)f(x)L_{f}h(x)=\nabla h(x)^{\top}f(x) and Lgh(x)=h(x)g(x)L_{g}h(x)=\nabla h(x)^{\top}g(x) are the Lie derivatives of h(x)h(x).

Given a CBF h()h(\cdot), the non-empty set of point-wise safe control actions is given by any locally Lipschitz continuous controller π(x)𝒦cbf(x)={u𝒰Lfh(x)+Lgh(x)u+α(h(x))0}\pi(x)\in\mathcal{K}_{\mathrm{cbf}}(x)=\{u\in\mathcal{U}\mid L_{f}h(x)+L_{g}h(x)u+\alpha(h(x))\geq 0\} renders the set 𝒮\mathcal{S} forward invariant for the closed-loop system, which enables the construction of a minimally-invasive safety filter:

π(x):=argmin𝑢uur(x)22subject to\displaystyle\pi(x):=\underset{u}{\text{argmin}}\ \lVert u-u_{r}(x)\rVert_{2}^{2}\quad\text{subject to} Lfh(x)+Lgh(x)u+α(h(x))0,u𝒰,\displaystyle\quad L_{f}h(x)+L_{g}h(x)u+\alpha(h(x))\geq 0,\ u\in\mathcal{U}, (3)

where ur(x)u_{r}(x) is any given reference but potentially unsafe controller. Under the assumption that 𝒰\mathcal{U} is a polyhedron, 𝒦cbf(x)\mathcal{K}_{\mathrm{cbf}}(x) is also a polyhedral set and problem (3) becomes a convex QP.

2.2 Problem Statement

While the CBF-QP filter is minimally invasive and guarantees x(t)L0+(h)𝒳x(t)\in L_{0}^{+}(h)\subseteq\mathcal{X} for all time, a small L0+(h)L_{0}^{+}(h) essentially limits the ability of the reference control to execute a task. To take the performance of the reference controller into account, we consider the following problem.

Problem 1 (Performance-Oriented CBF).

Given the input-constrained system (1) and a safe set 𝒳\mathcal{X} defined by complex safety constraints (to be specified in Section 3.1), synthesize a CBF h()h(\cdot) with an induced control invariant set 𝒮=L0+(h)\mathcal{S}=L_{0}^{+}(h) such that (i) 𝒮𝒳\mathcal{S}\subseteq\mathcal{X} and (ii) the volume of 𝒮\mathcal{S} is maximized. Formally, this problem can be cast as an infinite-dimensional optimization problem:

maximizeh\displaystyle\underset{h\in\mathcal{H}}{\mathrm{maximize}}\quad volume(L0+(h))(performance)\displaystyle\mathrm{volume}(L_{0}^{+}(h))\quad(\texttt{performance}) (4)
subjectto\displaystyle\mathrm{subject\ to} L0+(h)𝒳(safety)\displaystyle L_{0}^{+}(h)\subseteq\mathcal{X}\quad(\texttt{safety})
supu𝒰{Lfh(x)+Lgh(x)u+α(h(x))}0,x𝒟(control invariance)\displaystyle\sup_{u\in\mathcal{U}}\{L_{f}h(x)+L_{g}h(x)u+\alpha(h(x))\}\!\geq\!0,\ \forall x\in\mathcal{D}\ (\texttt{control invariance)}

where \mathcal{H} is the class of scalar-valued continuously differentiable functions.

3 Proposed Method

In this section, we present our three-step method of learning PINN-CBF :

  1. (S1)

    Composition of complex state constraints: Given multiple state constraints composed by Boolean logic describing the safe set 𝒳\mathcal{X}, we equivalently represent 𝒳\mathcal{X} as a zero super level set of a single non-smooth function c()c(\cdot), i.e., 𝒳=L0+(c)={xc(x)0}\mathcal{X}=L_{0}^{+}(c)=\{x\mid c(x)\geq 0\}.

  2. (S2)

    Inner approximation of safe set: Given the constraint function c()c(\cdot) obtained from the previous step, we derive a smooth minorizer c¯()\underline{c}(\cdot) of c()c(\cdot), i.e., c(x)c¯(x)c(x)\geq\underline{c}(x) for all x𝒟x\in\mathcal{D}, s.t. L0+(c¯)𝒳L_{0}^{+}(\underline{c})\subseteq\mathcal{X}.

  3. (S3)

    Learning performance-oriented CBF: To approximate the largest control invariant subset of L0+(c¯)L_{0}^{+}(\underline{c}), we design a training loss function based on control barrier-value functions and HJ PDE [33]. We propose a parameterization of the CBF and a sampling strategy exploiting the structure of the PDE.

3.1 Composition of Complex State Constraints

Suppose we are given NN sets 𝒮i:=L0+(si)={xsi(x)0}\mathcal{S}_{i}:=L_{0}^{+}(s_{i})=\{x\mid s_{i}(x)\geq 0\} where each si:ns_{i}:\mathbb{R}^{n}\mapsto\mathbb{R} is continuously differentiable and the safe set 𝒳\mathcal{X} is described by logical operations on {𝒮i}i=1N\{\mathcal{S}_{i}\}_{i=1}^{N}. Since all Boolean logical operations can be expressed as the composition of the three fundamental operations conjunction, disjunction, and negation [6, 15], it suffices to only demonstrate the set operations shown below:

  1. 1.

    Conjunction: x𝒮ix\in\mathcal{S}_{i} AND x𝒮jx\in\mathcal{S}_{j} \Leftrightarrow x𝒮i𝒮j={xs~(x):=min(si(x),sj(x))0}x\in\mathcal{S}_{i}\cap\mathcal{S}_{j}=\{x\mid\tilde{s}(x):=\min(s_{i}(x),s_{j}(x))\geq 0\}.

  2. 2.

    Disjunction: x𝒮ix\in\mathcal{S}_{i} OR x𝒮jx\in\mathcal{S}_{j} \Leftrightarrow x𝒮i𝒮j={xs~(x):=max(si(x),sj(x))0}x\in\mathcal{S}_{i}\cup\mathcal{S}_{j}=\{x\mid\tilde{s}(x):=\max(s_{i}(x),s_{j}(x))\geq 0\}.

  3. 3.

    Negation: NOT x𝒮ix\in\mathcal{S}_{i} \Leftrightarrow x𝒮i={xs~(x):=si(x)0}x\in\mathcal{S}_{i}^{\complement}=\{x\mid\tilde{s}(x):=-s_{i}(x)\geq 0\} (complement of 𝒮i\mathcal{S}_{i}).

The conjunction and disjunction of two constraints can be exactly expressed as one constraint composed through the min\min and max\max operator, respectively. Furthermore, negating a constraint si(x)0s_{i}(x)\geq 0 only requires flipping the sign of sis_{i}. These logical operations enable us to capture complex geometries and logical constraints as illustrated in the following two examples.

Example 1 (Complex geometric sets).

Consider x=[x1x2]2x=[x_{1}\ x_{2}]^{\top}\in\mathbb{R}^{2} and two rectangular obstacles given by 𝒪i:={x[aibi]x[cidi]}\mathcal{O}_{i}:=\{x\mid\bigl{[}\begin{smallmatrix}a_{i}\\ b_{i}\end{smallmatrix}\bigr{]}\leq x\leq\bigl{[}\begin{smallmatrix}c_{i}\\ d_{i}\end{smallmatrix}\bigr{]}\} with i=1,2i=1,2. The union of the two rectangular obstacles 𝒪1𝒪2\mathcal{O}_{1}\cup\mathcal{O}_{2} is a nonconvex set. Define the following functions

s1(x)=x1+c1,s2(x)=x1a1,s3(x)=x2+d1,s4(x)=x2b1,\displaystyle s_{1}(x)=-x_{1}+c_{1},s_{2}(x)=x_{1}-a_{1},s_{3}(x)=-x_{2}+d_{1},s_{4}(x)=x_{2}-b_{1},
s5(x)=x1+c2,s6(x)=x1a2,s7(x)=x2+d2,s8(x)=x2b2,\displaystyle s_{5}(x)=-x_{1}+c_{2},s_{6}(x)=x_{1}-a_{2},s_{7}(x)=-x_{2}+d_{2},s_{8}(x)=x_{2}-b_{2},

and let c(x)=max(min(s1(x),s2(x),s3(x),s4(x)),min(s5(x),s6(x),s7(x),s8(x)))c(x)=\max(\min(s_{1}(x),s_{2}(x),s_{3}(x),s_{4}(x)),\min(s_{5}(x),s_{6}(x),s_{7}(x),s_{8}(x))). Then, we have L0+(c)=𝒪1𝒪2L_{0}^{+}(c)=\mathcal{O}_{1}\cup\mathcal{O}_{2}.

Example 2 (Logical constraints).

Consider the three constraints si(x)0,i=1,2,3s_{i}(x)\geq 0,i=1,2,3, at least two of which must be satisfied. This specification is equivalent to the constraint c(x)0c(x)\geq 0, where

c(x)=max(min(s1(x),s2(x)),min(s2(x),s3(x)),min(s1(x),s3(x))).\displaystyle c(x)=\max(\min(s_{1}(x),s_{2}(x)),\min(s_{2}(x),s_{3}(x)),\min(s_{1}(x),s_{3}(x))).

In summary, by composing the min\min and max\max operators, we can construct a level-set function c:nc:\mathbb{R}^{n}\mapsto\mathbb{R} such that x𝒳c(x)0,i.e.,𝒳=L0+(c)x\in\mathcal{X}\Leftrightarrow c(x)\geq 0,\text{i.e.,}\ \mathcal{X}=L_{0}^{+}(c). Being an exact description of 𝒳\mathcal{X}, however, c()c(\cdot) is not smooth. Next, we find a smooth lower bound of c()c(\cdot) that facilitates CBF design.

3.2 Inner Approximation of Safe Set

To find a smooth lower bound for c¯()\underline{c}(\cdot), we utilize its compositional structure. We bound the max\max operators using the log-sum-exponential function as follows [15] (min\min follows similarly):

1βlog(i=1Meβsi)log(M)β\displaystyle\frac{1}{\beta}\log(\sum_{i=1}^{M}e^{\beta s_{i}})-\frac{\log(M)}{\beta} max(s1,,sM)1βlog(i=1Meβsi),\displaystyle\leq\max(s_{1},\cdots,s_{M})\leq\frac{1}{\beta}\log(\sum_{i=1}^{M}e^{\beta s_{i}}), (5)

with β>0\beta>0. As β\beta\rightarrow\infty, these bounds can be made arbitrarily accurate. We note that both the lower and upper bounds in (5) are smooth and strictly increasing in each input sis_{i}. Therefore, to obtain a lower bound c¯(x)\underline{c}(x) of c(x)c(x), it suffices only to compose the lower bounds on each min\min and max\max function. In Figure 2 we show the effect of β\beta on the resulting inner approximation.

Refer to caption
Figure 2: Effect of the smoothing parameter β\beta on the resulting inner approximation. β{+,10,5,2\beta\in\{+\infty,10,5,2}.

3.3 Learning PINN-CBF 

The smooth function c¯()\underline{c}(\cdot), whose 0-superlevel set provides an inner approximation of the safe set 𝒳\mathcal{X}, is not necessarily a CBF. Thus, we aim to find the “closest” CBF approximation of c¯()\underline{c}(\cdot). We learn our Neural Network (NN) model using a Hamilton-Jacobi (HJ) PDE from reachability analysis whose infinite-time horizon solution precisely characterizes the CBF maximizing the volume of L0+(h)L_{0}^{+}(h). Our neural network model is trained by minimizing the PDE residual, grounding the method in physics, and justifying its name (PINN)-CBF.

Hamilton-Jacobi PDE for reachability

Consider the dynamics (1) in the time interval [t,0][t,0], where t0t\leq 0 and xx are the initial time and state, respectively. Define 𝒰[t,0]\mathcal{U}_{[t,0]} as the set of Lebesgue measurable functions u:[t,0]𝒰u\colon[t,0]\to\mathcal{U}. Let ψ(s):=ψ(s;x,t,u()):[t,0]n\psi(s):=\psi(s;x,t,u(\cdot)):[t,0]\mapsto\mathbb{R}^{n} denote the unique solution of (1) given xx and u()𝒰[t,0]u(\cdot)\in\mathcal{U}_{[t,0]}. Given a bounded Lipschitz continuous function :𝒟\ell:\mathcal{D}\mapsto\mathbb{R}, the viability kernel of L0+()={x(x)0}L_{0}^{+}(\ell)=\{x\mid\ell(x)\geq 0\} is defined as

𝒱(t):={xL0+()u()𝒰[t,0] s.t. s[t,0],ψ(s)L0+()},\mathcal{V}(t):=\{x\in L_{0}^{+}(\ell)\mid\exists u(\cdot)\in\mathcal{U}_{[t,0]}\text{ s.t. }\forall s\in[t,0],\psi(s)\in L_{0}^{+}(\ell)\}, (6)

which is the set of all initial states within L0+()L_{0}^{+}(\ell) from which there exists an admissible control signal u()u(\cdot) that keeps the system trajectory within L0+()L_{0}^{+}(\ell) during the time interval [t,0][t,0]. Solving for the viability kernel can be posed as an optimal control problem, where 𝒱(t)\mathcal{V}(t) can be expressed as the superlevel set of a value function called control barrier-value function (CBVF).

Definition 2 (CBVF [33]).

Given a discount factor γ0\gamma\geq 0, the control barrier-value function Bγ:𝒟×(,0]B_{\gamma}:\mathcal{D}\times(-\infty,0]\mapsto\mathbb{R} is defined as Bγ(x,t):=maxu()𝒰[t,0]mins[t,0]eγ(st)(ψ(s))B_{\gamma}(x,t):=\underset{u(\cdot)\in\mathcal{U}_{[t,0]}}{\max}\ \underset{s\in[t,0]}{\min}e^{\gamma(s-t)}\ell(\psi(s)).

For t0t\leq 0, we have 𝒱(t)={xBγ(x,t)0}\mathcal{V}(t)=\{x\mid B_{\gamma}(x,t)\geq 0\} [33, Proposition 2]. Additionally, BγB_{\gamma} is the unique Lipschitz continuous viscosity solution of the following HJ PDE with terminal condition Bγ(x,0)=(x)B_{\gamma}(x,0)=\ell(x) [33, Theorem 3]:

min{(x)Bγ(x,t),tBγ(x,t)+maxu𝒰xBγ(x,t)(f(x)+g(x)u)+γBγ(x,t)}=0.\min\left\{\ell(x)\!-\!B_{\gamma}(x,t),\frac{\partial}{\partial t}B_{\gamma}(x,t)\!+\!\max_{u\in\mathcal{U}}\nabla_{x}B_{\gamma}(x,t)^{\top}(f(x)\!+\!g(x)u)\!+\!\gamma B_{\gamma}(x,t)\right\}\!=\!0. (7)

Under mild assumptions of Lipschitz continuous dynamics and (x)\ell(x) being a signed distance function, Bγ(x,t)B_{\gamma}(x,t) is differentiable almost everywhere [39]. Furthermore, taking tt\rightarrow-\infty, the steady state solution Bγ(x):=Bγ(x,)B_{\gamma}(x):=B_{\gamma}(x,-\infty) gives us the maximal control invariant set contained in L0+()L_{0}^{+}(\ell) [33, Section II.B]:

𝒩(Bγ,):=min{(x)Bγ(x),maxu𝒰LfBγ(x)+LgBγ(x)u+γBγ(x)}=0,x𝒟\displaystyle\mathcal{N}(B_{\gamma},\ell):=\min\left\{\ell(x)\!-\!B_{\gamma}(x),\max_{u\in\mathcal{U}}L_{f}B_{\gamma}(x)\!+\!L_{g}B_{\gamma}(x)u\!+\!\gamma B_{\gamma}(x)\right\}\!=\!0,\;\forall x\in\mathcal{D} (8)

We will leverage the above PDE to learn a CBF hh whose zero superlevel set approximates the maximal control invariant set.

CBF parameterization In the context of our problem, we are interested in the viability kernel of L0+(c¯(x))L_{0}^{+}(\underline{c}(x)), where c¯\underline{c} is the smoothed composition of the constraints. Thus, our goal is to learn a CBF hh that satisfies the PDE 𝒩(h,c¯)=0\mathcal{N}(h,\underline{c})=0. To this end, we parameterize the CBF candidate as

hθ(x)=c¯(x)δθ(x),h_{\theta}(x)=\underline{c}(x)-\delta_{\theta}(x), (9)

where δθ:𝒟0\delta_{\theta}\colon\mathcal{D}\to\mathbb{R}_{\geq 0} is a non-negative continuously differentiable function approximator with parameters θ\theta. In this paper, we parameterize δθ\delta_{\theta} in the form of a multi-layer perceptron:

δθ(x)=σ+(WLzL+bL),zk+1=σ(Wkzk+bk),k=0,,L1,z0=x,\delta_{\theta}(x)=\sigma_{+}(W_{L}z_{L}+b_{L}),\ z_{k+1}=\sigma(W_{k}z_{k}+b_{k}),k=0,\cdots,L-1,\ z_{0}=x, (10)

where {Wk,bk}k=0K\{W_{k},b_{k}\}_{k=0}^{K} are the learnable parameters, σ()\sigma(\cdot) is a smooth activation such as the ELU, Tanh, or Swish functions [40], and σ+()\sigma_{+}(\cdot) is a smooth function with non-negative outputs, i.e., σ+(r)0,r\sigma_{+}(r)\geq 0,\forall r\in\mathbb{R}. We choose σ+()\sigma_{+}(\cdot) as the Softplus function, σ+(r)=1βlog(1+exp(βr)),β>0\sigma_{+}(r)=\frac{1}{\beta}\log(1+\exp(\beta r)),\;\beta>0.

The parameterization (9) offers several advantages over using a standard MLP model for hθh_{\theta}. First, it accelerates learning by enforcing the non-negativity constraint c¯(x)h(x)\underline{c}(x)\geq h(x) in (8) by design. Second, complex constraints are directly integrated into the system via c¯(x)\underline{c}(x), simplifying the learning process. Importantly, hθ(x)h_{\theta}(x) is automatically non-positive in regions containing obstacles, allowing the learning process to focus elsewhere.

In summary, the parameterization in (9) ensures that hθ()h_{\theta}(\cdot) is smooth and satisfies hθ(x)c¯(x)c(x)h_{\theta}(x)\leq\underline{c}(x)\leq c(x) for all x𝒟x\in\mathcal{D}, implying that L0+(hθ)L0+(c¯)L0+(c)=𝒳L_{0}^{+}(h_{\theta})\subseteq L_{0}^{+}(\underline{c})\subseteq L_{0}^{+}({c})=\mathcal{X} by construction.

Initialization We propose a specialized initialization scheme by letting WL,bL=0W_{L},b_{L}=0, initializing the candidate hθh_{\theta} with c¯\underline{c}. This initialization favors finding the closest CBF to c¯(x)\underline{c}(x) during training. Setting only the last layer to zero allows training to break symmetry and retain gradient flow.

Training loss and sampling distribution Given the parameterization (9), we now propose to train hθh_{\theta} to approximately satisfy the steady-state HJ PDE. This leads to the following physics-informed risk minimization

minimize𝜃J(θ)=𝔼xμ[HJ(x;θ,γ)+λCBF(x;θ,γ)2],λ0.\underset{\theta}{\text{minimize}}\quad J(\theta)=\mathbb{E}_{x\sim\mu}\left[\mathcal{L}_{\text{HJ}}(x;\theta,\gamma)+\lambda\mathcal{L}_{\text{CBF}}(x;\theta,\gamma)^{2}\right],\quad\lambda\geq 0. (11a)
HJ(x;θ,γ)\displaystyle\mathcal{L}_{\text{HJ}}(x;\theta,\gamma) =(min{c¯(x)hθ(x),maxu𝒰Lfhθ(x)+Lghθ(x)u+γhθ(x)})2\displaystyle=\left(\min\left\{\underline{c}(x)-h_{\theta}(x),\ \max_{u\in\mathcal{U}}L_{f}h_{\theta}(x)+L_{g}h_{\theta}(x)u+\gamma h_{\theta}(x)\right\}\right)^{2} (11b)
CBF(x;θ,γ)\displaystyle\mathcal{L}_{\text{CBF}}(x;\theta,\gamma) =max{maxu𝒰Lfhθ(x)Lghθ(x)uγhθ(x),0}.\displaystyle=\max\left\{-\max_{u\in\mathcal{U}}L_{f}h_{\theta}(x)-L_{g}h_{\theta}(x)u-\gamma h_{\theta}(x),0\right\}. (11c)

The first loss HJ(x;θ,γ)\mathcal{L}_{\text{HJ}}(x;\theta,\gamma) is the square of the PDE residual to increase the volume of L0+(hθ)L_{0}^{+}(h_{\theta}), while the second loss CBF(x;θ,γ)\mathcal{L}_{\text{CBF}}(x;\theta,\gamma) enforces the CBF condition. We note that the latter is implicitly present in the PDE but not enforced by the corresponding loss HJ(x;θ,γ)\mathcal{L}_{\text{HJ}}(x;\theta,\gamma).

Finally, μ\mu represents a sampling distribution over L0+(c¯)L_{0}^{+}(\underline{c}), meaning no learning is required within obstacle regions. Efficient sampling in these regions can be achieved using intelligent sampling methods like envelope rejection [41] or the random walk Metropolis-Hastings (RWMH) [42], both of which are well-suited for cluttered environments. Notably, the training process remains self-supervised, as only domain points are collected for learning.

In the following, we demonstrate that jointly enforcing HJ\mathcal{L}_{\text{HJ}} and CBF\mathcal{L}_{\text{CBF}} together with intelligent sampling (IS) improves the performance and safety of the learned PINN-CBF. We note that for polyhedral actuation constraint sets, the supremum over uu is achieved at one of the vertices, yielding a closed-form expression. We summarize our self-supervised training method in Algorithm 1.

Algorithm 1 Training PINN-CBFs
Input: Constraints {ci}\{c_{i}\}, smoothing parameter β>0\beta>0, discount factor γ>0\gamma>0, regularization parameter λ0\lambda\geq 0
Output: Modules δθ()\delta_{\theta}(\cdot) and c¯()\underline{c}(\cdot) s.t. hθ=c¯()δθ()h_{\theta}=\underline{c}(\cdot)-\delta_{\theta}(\cdot)
1. Following sections 3.1 and 3.2, compose constraints {ci}\{c_{i}\} using β\beta to form c¯\underline{c}.
2. Sample training data, X={xiL0+(c¯)}i=1NX=\{x_{i}\in L_{0}^{+}(\underline{c})\}_{i=1}^{N}.
3. Initialize weights and biases of the model δθ\delta_{\theta} randomly with WL,bL=0W_{L},b_{L}=0.
4. ERM: minimize𝜃J^(θ)=1Ni=1N[HJ(xi;θ,γ)+λCBF(xi;θ,γ)2]\underset{\theta}{\text{minimize}}\quad\hat{J}(\theta)=\frac{1}{N}\sum_{i=1}^{N}\left[\mathcal{L}_{\text{HJ}}(x_{i};\theta,\gamma)+\lambda\mathcal{L}_{\text{CBF}}(x_{i};\theta,\gamma)^{2}\right]

4 Experiments

We demonstrate the efficacy of PINN-CBF on a 2D double integrator system and a 7D fixed-wing aircraft system. Experiments are performed on Google Colab, where the PINN-CBF training is performed on a T4 GPU with 15GB RAM and its validation on a CPU with 52GB RAM. The code is available at https://github.com/o4lc/PINN-CBF.

4.1 Experiment Setup

Double Integrator

First, we consider the double integrator benchmark x˙=vv˙=u\dot{x}=v\quad\dot{v}=u, with xx\in\mathbb{R} denoting the position and vv\in\mathbb{R} denoting the velocity. The action uu\in\mathbb{R} represents the acceleration of the system and directly operates on vv.

We generate a complex obstacle configuration consisting of rotated rectangles and unit walls bordering the state space 𝒟={(x,v)1x11,6v6}\mathcal{D}=\{(x,v)\mid-1\leq x\leq 11,-6\leq v\leq 6\} shown in Fig. 2. Then, we parameterize PINN-CBF following Section 3.3 and train it with N=104N=10^{4} uniform samples. At each state in the training data, we compute the values and gradient of c¯\underline{c} using JAX [43] flexible automatic-differentiation and hardware acceleration. The acceleration uu is bounded by u[1,1]u\in[-1,1]. The nominal controller is given by a PID controller meant to stabilize the system to a target position (see Section A.1).

Fixed-Wing Aircraft The Dubins fixed-wing Plane system has 7D states and 3D limited actuation control. Its full dynamics are shown in Appendix A.2. Sampling and training are done as before except with N=106N=10^{6} to compensate for a higher dimensional space 𝒟={(n,e,d,ϕ,θ,ψ,VT)6n,e6,1d11,2πϕ,θ,ψ2π,0.5VT2}\mathcal{D}=\{(n,e,d,\phi,\theta,\psi,V_{T})\mid-6\leq n,e\leq 6,-1\leq d\leq 11,-2\pi\leq\phi,\theta,\psi\leq 2\pi,0.5\leq V_{T}\leq 2\}. Although 106(104)7=101410^{6}\ll(\sqrt{10^{4}})^{7}=10^{14}, as suggested by naive dimensional scaling, we aim to demonstrate how learning can overcome the curse of dimensionality. The action u={AT,p,q}u=\{A_{T},p,q\} is bounded in magnitude by |u|[10.5,1,1]|u|\leq[10.5,1,1]. Rectangular obstacles occur in n,e,dn,e,d. We chose a nominal trajectory resembling takeoff to evaluate the PINN-CBF filter against relevant baselines on collision avoidance.

4.2 Results

Double-Integrator

We evaluate the combination of loss functions (11b) and (11c) with intelligent sampling (IS) from Algorithm 1, as shown in Table 1. All proposed ablations and baseline methods use the same architecture (9) for a fair comparison, accommodating complex input-space constraints. The MLP for δθ\delta_{\theta} has layer widths of 2505012-50-50-1, while DeepReach [37] includes time as an additional input and is applied over the full horizon with t=0t=0.

Using three metrics, We validate at 10510^{5} grid samples covering the effective obstacle-free space L0+(c¯)L_{0}^{+}(\underline{c}). The residual error |𝒩(h,c¯)|\lvert\mathcal{N}(h,\underline{c})\rvert denotes how close the learned CBF is to the maximal-invariant solution of the HJ PDE, CBF\mathcal{L}_{\text{CBF}} measures safety violation, and Vol(L0+(hθ))/Vol(L0+(c¯))\mathrm{Vol}(L_{0}^{+}(h_{\theta}))/\mathrm{Vol}(L_{0}^{+}(\underline{c})) represents the volume ratio between the learned invariant set and the smoothed safe set, which serves as an upper bound enforced by the architecture.

Method Mean |𝒩(h,c¯)|\lvert\mathcal{N}(h,\underline{c})\rvert over L0+(c¯)L_{0}^{+}(\underline{c}) Mean CBF\mathcal{L}_{\text{CBF}} over L0+(c¯)L_{0}^{+}(\underline{c}) Vol(L0+(hθ))/Vol(L0+(c¯))\mathrm{Vol}(L_{0}^{+}(h_{\theta}))/\mathrm{Vol}(L_{0}^{+}(\underline{c}))
HJ\mathcal{L}_{\text{HJ}} 0.0260.026 0.0160.016 0.5660.566
HJ+λCBF2\mathcal{L}_{\text{HJ}}+\lambda\mathcal{L}_{\text{CBF}}^{2} 0.0260.026 0.0150.015 0.5470.547
HJ+λCBF2\mathcal{L}_{\text{HJ}}+\lambda\mathcal{L}_{\text{CBF}}^{2} (IS) 0.0220.022 0.0110.011 0.6190.619
CBF2\mathcal{L}_{\text{CBF}}^{2} [22] 2.0022.002 8.057×1058.057\times 10^{-5} 0.00.0
NCBF [26] 0.1780.178 0.1770.177 1.01.0
DeepReach [37] 0.2060.206 0.1950.195 0.9840.984
Table 1: Comparison of HJ\mathcal{L}_{\text{HJ}} methods with our ablations above baselines for the 2D Double-Integrator system.

In the ablations, scalarization with λ=0.2\lambda=0.2 reduces safety violations but decreases volume. In contrast, intelligent sampling lowers residuals and safety violations while increasing volume, suggesting it mitigates the tradeoff of scalarization. Among the baselines, directly enforcing the CBF condition (2) with CBF2\mathcal{L}_{\text{CBF}}^{2} results in a highly safe but ineffective filter with zero volume. NCBF and DeepReach achieve high volume but with significantly higher safety errors. Next, we will examine how these methods perform on a more complex fixed-wing plane system.

Fixed-Wing Plane The same comparisons conducted for the Double-Integrator are also performed with the following adjustments: all methods utilize a 710010017-100-100-1 MLP architecture. DeepReach includes an additional input for time. Additionally, validation is carried out on 10710^{7} uniformly random sampled states instead of grid samples.

Model Mean |𝒩(h,c¯)|\lvert\mathcal{N}(h,\underline{c})\rvert over L0+(c¯)L_{0}^{+}(\underline{c}) Mean CBF\mathcal{L}_{\text{CBF}} over L0+(c¯)L_{0}^{+}(\underline{c}) Volume of L0+(hθ)/L0+(c¯)L_{0}^{+}(h_{\theta})/L_{0}^{+}(\underline{c})
HJ\mathcal{L}_{\text{HJ}} 0.3450.345 0.0880.088 0.7070.707
HJ+λCBF2\mathcal{L}_{\text{HJ}}+\lambda\mathcal{L}_{\text{CBF}}^{2} 0.3450.345 0.0820.082 0.7020.702
HJ+λCBF2\mathcal{L}_{\text{HJ}}+\lambda\mathcal{L}_{\text{CBF}}^{2} (IS) 0.3390.339 0.0810.081 0.7000.700
CBF2\mathcal{L}_{\text{CBF}}^{2} [22] 5.7945.794 2.731×1042.731\times 10^{-4} 8.400×1068.400\times 10^{-6}
NCBF [26] 0.7410.741 0.2400.240 0.5790.579
DeepReach [37] 0.5430.543 0.5000.500 0.9510.951
Table 2: Comparison of HJ\mathcal{L}_{\text{HJ}} methods with our ablations above baselines for the 7D Fixed-Wing plane system.

Training with CBF2\mathcal{L}_{\text{CBF}}^{2} is ineffective, yielding nearly zero volume. The performance of NCBF severely declines due to the increased difficulty of sampling for counterexamples in higher dimensions. In contrast, DeepReach achieves higher volume at the cost of increased safety violations. Our method, PINN-CBF, achieves both low conservatism and safety. To illustrate this, we visualize the performance of PINN-CBF, NCBF, and DeepReach in filtering a nominal trajectory representing takeoff.

Fixed-Wing Aircraft Example In Fig. 4, the fixed-wing aircraft starting from the given position collides with a 3D obstacle, while in Fig. 4, obstacles are avoided with the PINN-CBF safety filter.

Refer to caption
Figure 3: Nominal reference trajectory illustrating crash. The red arrow is the direction of velocity.
Refer to caption
Figure 4: Agile takeoff avoiding obstacles under actuation limits: NCBF is conservative, DeepReach collides, PINN-CBF balances both. A video is linked here.

5 Conclusion and Limitations

We introduced a self-supervised framework for learning control barrier functions (CBFs) for limited-actuation systems with complex safety constraints. Our approach maximizes the control invariant set volume through physics-informed learning on a Hamilton-Jacobi (HJ) PDE characterizing the viability kernel. Additionally, we proposed a neural CBF parameterization that leverages the PDE structure.

A key limitation of learning-based CBFs is the lack of guarantees on the control invariant set, as well as the need for many samples, similar to physics-informed methods. Our intelligent sampling (IS) strategy mitigates this to some extent, but the approach remains restricted to specific domains and obstacle configurations. Future work will explore domain-adaptive CBFs that generalize across different obstacle settings, building on recent advances in physics-informed techniques and further investigating links between CBF learning, HJ reachability analysis, and Control Lyapunov Value Functions [44].

References

  • Ames et al. [2019] A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, and P. Tabuada. Control barrier functions: Theory and applications. In 2019 18th European control conference (ECC), pages 3420–3431. IEEE, 2019.
  • Xiao et al. [2021] W. Xiao, N. Mehdipour, A. Collin, A. Y. Bin-Nun, E. Frazzoli, R. D. Tebbens, and C. Belta. Rule-based optimal control for autonomous driving. In Proceedings of the ACM/IEEE 12th International Conference on Cyber-Physical Systems, pages 143–154, 2021.
  • Xu and Sreenath [2018] B. Xu and K. Sreenath. Safe teleoperation of dynamic uavs through control barrier functions. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 7848–7855. IEEE, 2018.
  • Grandia et al. [2021] R. Grandia, A. J. Taylor, A. D. Ames, and M. Hutter. Multi-layered safety for legged robots via control barrier functions and model predictive control. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 8352–8358. IEEE, 2021.
  • Marvi and Kiumarsi [2021] Z. Marvi and B. Kiumarsi. Safe reinforcement learning: A control barrier function optimization approach. International Journal of Robust and Nonlinear Control, 31(6):1923–1940, 2021.
  • Glotfelter et al. [2017] P. Glotfelter, J. Cortés, and M. Egerstedt. Nonsmooth barrier functions with applications to multi-robot systems. IEEE control systems letters, 1(2):310–315, 2017.
  • Lindemann and Dimarogonas [2018] L. Lindemann and D. V. Dimarogonas. Control barrier functions for signal temporal logic tasks. IEEE control systems letters, 3(1):96–101, 2018.
  • Rabiee and Hoagg [2023] P. Rabiee and J. B. Hoagg. Soft-minimum barrier functions for safety-critical control subject to actuation constraints. In 2023 American Control Conference (ACC), pages 2646–2651. IEEE, 2023.
  • Rabiee and Hoagg [2025] P. Rabiee and J. B. Hoagg. Soft-minimum and soft-maximum barrier functions for safety with actuation constraints. Automatica, 171:111921, 2025.
  • Xiao and Belta [2021] W. Xiao and C. Belta. High-order control barrier functions. IEEE Transactions on Automatic Control, 67(7):3655–3662, 2021.
  • Nguyen and Sreenath [2016] Q. Nguyen and K. Sreenath. Exponential control barrier functions for enforcing high relative-degree safety-critical constraints. In 2016 American Control Conference (ACC), pages 322–328. IEEE, 2016.
  • Ames et al. [2020] A. D. Ames, G. Notomista, Y. Wardi, and M. Egerstedt. Integral control barrier functions for dynamically defined control laws. IEEE control systems letters, 5(3):887–892, 2020.
  • Bansal et al. [2017] S. Bansal, M. Chen, S. Herbert, and C. J. Tomlin. Hamilton-jacobi reachability: A brief overview and recent advances. In 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pages 2242–2253. IEEE, 2017.
  • Karniadakis et al. [2021] G. E. Karniadakis, I. G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, and L. Yang. Physics-informed machine learning. Nature Reviews Physics, 3(6):422–440, 2021.
  • Molnar and Ames [2023] T. G. Molnar and A. D. Ames. Composing control barrier functions for complex safety specifications. arXiv preprint arXiv:2309.06647, 2023.
  • Safari and Hoagg [2023] A. Safari and J. B. Hoagg. Time-varying soft-maximum control barrier functions for safety in an a priori unknown environment. arXiv preprint arXiv:2310.05261, 2023.
  • Breeden and Panagou [2023] J. Breeden and D. Panagou. Compositions of multiple control barrier functions under input constraints. In 2023 American Control Conference (ACC), pages 3688–3695. IEEE, 2023.
  • Xiao et al. [2023a] W. Xiao, T.-H. Wang, R. Hasani, M. Chahine, A. Amini, X. Li, and D. Rus. Barriernet: Differentiable control barrier functions for learning of safe robot control. IEEE Transactions on Robotics, 2023a.
  • Xiao et al. [2023b] W. Xiao, C. G. Cassandras, and C. A. Belta. Learning feasibility constraints for control barrier functions. arXiv preprint arXiv:2303.09403, 2023b.
  • Ma et al. [2022] H. Ma, B. Zhang, M. Tomizuka, and K. Sreenath. Learning differentiable safety-critical control using control barrier functions for generalization to novel environments. In 2022 European Control Conference (ECC), pages 1301–1308. IEEE, 2022.
  • Djeridane and Lygeros [2006] B. Djeridane and J. Lygeros. Neural approximation of pde solutions: An application to reachability computations. In Proceedings of the 45th IEEE Conference on Decision and Control, pages 3034–3039. IEEE, 2006.
  • Dawson et al. [2023] C. Dawson, S. Gao, and C. Fan. Safe control with learned certificates: A survey of neural lyapunov, barrier, and contraction methods for robotics and control. IEEE Transactions on Robotics, 2023.
  • Robey et al. [2020] A. Robey, H. Hu, L. Lindemann, H. Zhang, D. V. Dimarogonas, S. Tu, and N. Matni. Learning control barrier functions from expert demonstrations. In 2020 59th IEEE Conference on Decision and Control (CDC), pages 3717–3724. IEEE, 2020.
  • Dawson et al. [2022] C. Dawson, Z. Qin, S. Gao, and C. Fan. Safe nonlinear control using robust neural lyapunov-barrier functions. In Conference on Robot Learning, pages 1724–1735. PMLR, 2022.
  • Qin et al. [2021] Z. Qin, K. Zhang, Y. Chen, J. Chen, and C. Fan. Learning safe multi-agent control with decentralized neural barrier certificates. arXiv preprint arXiv:2101.05436, 2021.
  • Liu et al. [2023] S. Liu, C. Liu, and J. Dolan. Safe control under input limits with neural control barrier functions. In Conference on Robot Learning, pages 1970–1980. PMLR, 2023.
  • Dai et al. [2023] B. Dai, H. Huang, P. Krishnamurthy, and F. Khorrami. Data-efficient control barrier function refinement. In 2023 American Control Conference (ACC), pages 3675–3680. IEEE, 2023.
  • Nakamura-Zimmerer et al. [2021] T. Nakamura-Zimmerer, Q. Gong, and W. Kang. Adaptive deep learning for high-dimensional hamilton–jacobi–bellman equations. SIAM Journal on Scientific Computing, 43(2):A1221–A1247, 2021.
  • So et al. [2023] O. So, Z. Serlin, M. Mann, J. Gonzales, K. Rutledge, N. Roy, and C. Fan. How to train your neural control barrier function: Learning safety filters for complex input-constrained systems. arXiv preprint arXiv:2310.15478, 2023.
  • Dai et al. [2023] B. Dai, P. Krishnamurthy, and F. Khorrami. Learning a better control barrier function under uncertain dynamics. arXiv preprint arXiv:2310.04795, 2023.
  • Tan et al. [2023] D. C. Tan, F. Acero, R. McCarthy, D. Kanoulas, and Z. A. Li. Your value function is a control barrier function: Verification of learned policies using control theory. arXiv preprint arXiv:2306.04026, 2023.
  • Qin et al. [2022] Z. Qin, D. Sun, and C. Fan. Sablas: Learning safe control for black-box dynamical systems. IEEE Robotics and Automation Letters, 7(2):1928–1935, 2022.
  • Choi et al. [2021] J. J. Choi, D. Lee, K. Sreenath, C. J. Tomlin, and S. L. Herbert. Robust control barrier–value functions for safety-critical control. In 2021 60th IEEE Conference on Decision and Control (CDC), pages 6814–6821. IEEE, 2021.
  • Gong et al. [2022] Z. Gong, M. Zhao, T. Bewley, and S. Herbert. Constructing control lyapunov-value functions using hamilton-jacobi reachability analysis. IEEE Control Systems Letters, 7:925–930, 2022.
  • Mitchell and Templeton [2005] I. M. Mitchell and J. A. Templeton. A toolbox of hamilton-jacobi solvers for analysis of nondeterministic continuous and hybrid systems. In International workshop on hybrid systems: computation and control, pages 480–494. Springer, 2005.
  • Tonkens and Herbert [2022] S. Tonkens and S. Herbert. Refining control barrier functions through hamilton-jacobi reachability. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 13355–13362. IEEE, 2022.
  • Bansal and Tomlin [2021] S. Bansal and C. J. Tomlin. Deepreach: A deep learning approach to high-dimensional reachability. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 1817–1824. IEEE, 2021.
  • Nagumo [1942] M. Nagumo. Über die lage der integralkurven gewöhnlicher differentialgleichungen. Proceedings of the Physico-Mathematical Society of Japan. 3rd Series, 24:551–559, 1942.
  • Wabersich et al. [2023] K. P. Wabersich, A. J. Taylor, J. J. Choi, K. Sreenath, C. J. Tomlin, A. D. Ames, and M. N. Zeilinger. Data-driven safety filters: Hamilton-jacobi reachability, control barrier functions, and predictive methods for uncertain systems. IEEE Control Systems Magazine, 43(5):137–177, 2023.
  • Ramachandran et al. [2017] P. Ramachandran, B. Zoph, and Q. V. Le. Searching for activation functions. arXiv preprint arXiv:1710.05941, 2017.
  • Gilks and Wild [1992] W. R. Gilks and P. Wild. Adaptive rejection sampling for gibbs sampling. Journal of the Royal Statistical Society: Series C (Applied Statistics), 41(2):337–348, 1992.
  • Metropolis et al. [1953] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller. Equation of state calculations by fast computing machines. The journal of chemical physics, 21(6):1087–1092, 1953.
  • Bradbury et al. [2018] J. Bradbury, R. Frostig, P. Hawkins, M. J. Johnson, C. Leary, D. Maclaurin, G. Necula, A. Paszke, J. VanderPlas, S. Wanderman-Milne, and Q. Zhang. JAX: composable transformations of Python+NumPy programs, 2018. URL http://github.com/google/jax.
  • [44] W. Cho, M. Jo, H. Lim, K. Lee, D. Lee, S. Hong, and N. Park. Parameterized physics-informed neural networks for parameterized pdes. In Forty-first International Conference on Machine Learning.
  • Molnar et al. [2024] T. G. Molnar, S. K. Kannan, J. Cunningham, K. Dunlap, K. L. Hobbs, and A. D. Ames. Collision avoidance and geofencing for fixed-wing aircraft with control barrier functions. arXiv preprint arXiv:2403.02508, 2024.

Appendix A Appendix

A.1 Reference Control

The reference control for the (DI) obeys the PID control law

unom(t)=Kpe(t)+Ki0te(τ)𝑑τ+Kdde(t)dtu_{nom}(t)=K_{p}e(t)+K_{i}\int_{0}^{t}e(\tau)\,d\tau+K_{d}\frac{de(t)}{dt}

where e(t)e(t) is the error between the current state and the target state at time tt. For our experiments, we use the gains Kpx=1.2,Kpv=2.0,Kdx=0.1,Kdv=0.1K_{p}^{x}=-1.2,\;K_{p}^{v}=-2.0,\;K_{d}^{x}=0.1,\;K_{d}^{v}=0.1 with Kix=Kiv=0K_{i}^{x}=K_{i}^{v}=0.

For (F16), we hand-design a reference control policy resembling takeoff. To test the PINN-CBF filtering abilities, we obstruct the trajectory with rectangular obstacles and place walls to restrict the safe spatial coordinates to a box. In Figure 4, filtering demonstrates agile planning and obstacle avoidance. The following is an example control policy for takeoff while banking.

AT(t)\displaystyle A_{T}(t) =a8t+1,\displaystyle=\frac{a}{8}t+1, P(t)\displaystyle P(t) =p4sin(2t),\displaystyle=\frac{p}{4}\sin(2t), Q(t)\displaystyle Q(t) =q5cos(t),\displaystyle=\frac{q}{5}\cos(t), x(0)=[0,1,2,0,0,π,1]\displaystyle x(0)=[0,1,2,0,0,\pi,1]

A.2 3D Dubins Fixed-Wing Aircraft System Dynamics

Following [45], the dynamics of the Dubins fixed-wing aircraft (F16) system is given by:

n˙\displaystyle\dot{n} =VTcosψcosθ,\displaystyle=V_{T}\cos\psi\cos\theta, e˙\displaystyle\dot{e} =VTsinψcosθ,\displaystyle=V_{T}\sin\psi\cos\theta, (12)
d˙\displaystyle\dot{d} =VTsinθ,\displaystyle=-V_{T}\sin\theta, ϕ˙\displaystyle\dot{\phi} =P+sinϕtanθQ+cosϕtanθR,\displaystyle=P+\sin\phi\tan\theta\,Q+\cos\phi\tan\theta\,R,
θ˙\displaystyle\dot{\theta} =cosϕQsinϕR,\displaystyle=\cos\phi\,Q-\sin\phi\,R, ψ˙\displaystyle\dot{\psi} =sinϕcosθQ+cosϕcosθR,\displaystyle=\frac{\sin\phi}{\cos\theta}Q+\frac{\cos\phi}{\cos\theta}R,
VT˙\displaystyle\dot{V_{T}} =AT,\displaystyle=A_{T}, R\displaystyle R =gDVTsinϕcosθ\displaystyle=\frac{g_{D}}{V_{T}}\sin\phi\cos\theta

where [nedϕθψVT][n\ e\ d\ \phi\ \theta\ \psi\ V_{T}]^{\top} are the states and [ATPQ]T[A_{T}\ P\ Q]^{T} are the controls. In particular, n,e,dn,e,d are the position coordinates, VTV_{T} is the tangential velocity of the aircraft, ϕ,θ,ψ\phi,\theta,\psi are the Euler coordinates orienting the plane, ATA_{T} is the tangential acceleration control input, P and QP\text{ and }Q are rotational control inputs, and gDg_{D} is gravitational acceleration.