This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Feedback Strategies for Hypersonic Pursuit of a Ground Evader

Yoonjae Lee     Efstathios Bakolas     Maruthi R. Akella
Aerospace Engineering and Engineering Mechanics
The University of Texas at Austin
Austin
   TX 78712
yol033@utexas.edu
    bakolas@austin.utexas.edu     makella@mail.utexas.edu
Abstract

In this paper, we present a game-theoretic feedback terminal guidance law for an autonomous, unpowered hypersonic pursuit vehicle that seeks to intercept an evading ground target whose motion is constrained in a one-dimensional space. We formulate this problem as a pursuit-evasion game whose saddle point solution is in general difficult to compute onboard the hypersonic vehicle due to its highly nonlinear dynamics. To overcome this computational complexity, we linearize the nonlinear hypersonic dynamics around a reference trajectory and subsequently utilize feedback control design techniques from Linear Quadratic Differential Games (LQDGs). In our proposed guidance algorithm, the hypersonic vehicle computes its open-loop optimal state and input trajectories off-line and prior to the commencement of the game. These trajectories are then used to linearize the nonlinear equations of hypersonic motion. Subsequently, using this linearized system model, we formulate an auxiliary two-player zero-sum LQDG which is effective in the neighborhood of the given reference trajectory and derive its feedback saddle point strategy that allows the hypersonic vehicle to modify its trajectory online in response to the target’s evasive maneuvers. We provide numerical simulations to showcase the performance of our proposed guidance law.

1 Introduction

Trajectory optimization for hypersonic re-entry vehicles is in general a challenging problem that does not admit analytic solutions due to the vehicle’s highly nonlinear dynamics. For this reason, most research efforts in this area have focused on developing or enhancing numerical optimal control techniques, including direct [1] and indirect [2] methods, for efficient trajectory optimization. In the presence of parametric uncertainty or external disturbances, however, trajectories pre-computed offline may no longer be optimal during flight, which raises the need for a hypersonic vehicle to have an inner-loop feedback guidance system for stabilization or tracking. Traditionally, neighboring optimal (or extremal) control [3] has played an important role in aerospace applications and control of nonlinear systems which are subject to perturbations such as within initial states or terminal constraints. For the applications of neighboring optimal control in hypersonic trajectory design, one can refer to, for instance, [4, 5].

Robust control and guidance problems for hypersonic vehicles have been addressed by several different methods in the literature including LQR-based feedback guidance [6], desensitized optimal control [7], and deep learning method [8] to name but a few. Most of the recent work in hypersonic guidance are, however, focused on steering a hypersonic vehicle to a target that is either static or moves in a known/deterministic fashion but is not capable of maneuvering or evading. If the target can evade, the hypersonic trajectory optimizaiton problem evolves into a differential game [9], in which we are interested in finding a feedback strategy for the hypersonic vehicle to reshape its trajectory on the fly in response to the evading actions of the target.

Differential game theory, first proposed by Isaacs in his pioneering work [9], has been widely used in aerospace, defense, and robotics applications as a framework to model adversarial interactions between two or more players. Pursuit-evasion games, a special class of differential games in which a group of players attempt to capture another group of players, provide a powerful tool to obtain the worst-case strategies for intercepting a maneuvering target [10, 11]. Applying differential game theory to the problem of intercepting an evader by a hypersonic pursuer, however, to our best knowledge, has not yet been proposed in the relevant literature. This is due to analytical and computational complexities of the problem which can be attributed to the nonlinear dynamics of the hypersonic vehicle and the dependence of the atmospheric density upon the altitude. Although numerical methods for computing an open-loop representation of the feedback saddle point solution in two-player differential games have been well studied [12], such open-loop solutions can be ineffective against the target’s unexpected evasion.

The main contribution of this work is a novel formulation of the problem of intercepting a maneuvering ground target by a hypersonic pursuit vehicle as a tractable pursuit-evasion game. After solving for the target’s optimal evasion strategy based on a simplified version of the original nonlinear differential game, we reduce the latter game into an optimal control problem which can readily be solved with existing numerical techniques (either direct or indirect). Thereafter, we linearize the nonlinear hypersonic dynamics around the reference game trajectory and in turn formulate an auxiliary differential game whose feedback saddle point solution can be obtained analytically and computed quickly (real-time) and efficiently compared to the original noninear pursuit-evasion game.

The rest of this paper is structured as follows. In Section 2, the dynamical system models of an unpowered hypersonic pursuit vehicle and a maneuvering ground target are introduced to formulate a pursuit-evasion game. In Section 3, an approximation of the saddle point strategy of the target is computed and subsequently, the pursuit-evasion game is reduced to an optimal control problem whose solution will serve as a reference pursuit trajectory. In Section 4, the game dynamics are linearized with respect to the reference trajectory and an auxiliary two-player zero-sum LQDG is constructed. In Section 5, numerical simulations are presented. Lastly, in Section 6, some concluding remarks are provided.

2 Two-Player Pursuit-Evasion Game

In this section, the state space models that describe the motion of a hypersonic pursuit vehicle (PP) and a maneuvering ground target (TT) are introduced. Subsequently, the corresponding interception problem is formulated as a pursuit-evasion game.

2.1 Problem setup and system models

The game space, denoted by 𝒳2{\mathcal{X}}\subset{\mathbb{R}}^{2}, is defined as a vertical plane with the horizontal axis being the downrange and the vertical axis being the altitude, where 𝒳:={(x,h)2:h0}{\mathcal{X}}:=\{(x,h)\in{\mathbb{R}}^{2}:h\geq 0\}. Let 𝓧P:=𝒳×0×\bm{{\mathcal{X}}}_{P}:={\mathcal{X}}\times{\mathbb{R}}_{\geq 0}\times{\mathbb{R}} be a state space and 𝒰P:=[π/18,π/18]{\mathcal{U}}_{P}:=[-\pi/18,\pi/18] denote a compact set of admissible inputs, then the non-affine, nonlinear equations of planar hypersonic motion of PP in 𝒳{\mathcal{X}} are given by

𝒙˙P\displaystyle\dot{{\bm{x}}}_{P} =𝒇P(𝒙P,uP),𝒙P(0)=𝒙P0,\displaystyle={\bm{f}}_{P}({\bm{x}}_{P},u_{P}),\quad{\bm{x}}_{P}(0)={\bm{x}}_{P}^{0}, (1)

where 𝒙P𝓧P{\bm{x}}_{P}\in\bm{{\mathcal{X}}}_{P} (resp., 𝒙P0𝓧P{\bm{x}}_{P}^{0}\in\bm{{\mathcal{X}}}_{P}) is the state (resp., initial state), uP𝒰Pu_{P}\in{\mathcal{U}}_{P} is the control input, and 𝒇P:𝓧P×𝒰P𝓧P{\bm{f}}_{P}:\bm{{\mathcal{X}}}_{P}\times{\mathcal{U}}_{P}\rightarrow\bm{{\mathcal{X}}}_{P} is the vector field of PP. In particular, 𝒙P{\bm{x}}_{P} includes

x˙P\displaystyle\dot{x}_{P} =vPcosγP,\displaystyle=v_{P}\cos{\gamma_{P}}, (2)
h˙P\displaystyle\dot{h}_{P} =vPsinγP,\displaystyle=v_{P}\sin{\gamma_{P}}, (3)
v˙P\displaystyle\dot{v}_{P} =DmgsinγP,\displaystyle=-\frac{D}{m}-g\sin{\gamma_{P}}, (4)
γ˙P\displaystyle\dot{\gamma}_{P} =LmvPgcosγPvP,\displaystyle=\frac{L}{mv_{P}}-\frac{g\cos{\gamma_{P}}}{v_{P}}, (5)

where xPx_{P}, hPh_{P}, vPv_{P}, and γP\gamma_{P} denote the horizontal and vertical position, velocity, and flight path angle of PP, respectively. Furthermore, mm is the vehicle mass, gg is gravitational acceleration, and LL and DD are the lift and drag forces which satisfy

L\displaystyle L =12ρSvP2CL,\displaystyle=\frac{1}{2}\rho Sv_{P}^{2}C_{L}, (6)
D\displaystyle D =12ρSvP2CD.\displaystyle=\frac{1}{2}\rho Sv_{P}^{2}C_{D}. (7)

Here, SS is the vehicle reference area and ρ\rho is an exponential function that determines an approximation of the atmosphere density in terms of PP’s vertical position, that is,

ρ\displaystyle\rho =ρ0ehP/H,\displaystyle=\rho_{0}e^{-h_{P}/H}, (8)

where ρ0\rho_{0} is the surface air density and HH is the scale height. The lift and drag coefficients, CLC_{L} and CDC_{D}, are known functions of the angle of attack α\alpha:

CL\displaystyle C_{L} =CL,1α,\displaystyle=C_{L,1}\alpha, (9)
CD\displaystyle C_{D} =CD,0+CD,2α2,\displaystyle=C_{D,0}+C_{D,2}\alpha^{2}, (10)

where α\alpha (angle of attack) is the control input of PP, i.e., uP=αu_{P}=\alpha. Additionally, we denote the position of PP as 𝒑P:=(xP,hP)𝒳\bm{p}_{P}:=(x_{P},h_{P})\in{\mathcal{X}} (projection of 𝒙P{\bm{x}}_{P} on 𝒳{\mathcal{X}}).

The target TT is a ground vehicle whose motion is constrained along the horizontal axis. In this paper, we assume the speed of TT to be much less than that of PP yet TT has better maneuverability. In particular, its motion is modelled by the so-called simple motion kinematics [9] such that TT can directly control its velocity vector, that is,

𝒙˙T=𝐃uT,𝒙T(0)=𝒙T0,\displaystyle\dot{\bm{x}}_{T}={\mathbf{D}}u_{T},\quad{\bm{x}}_{T}(0)={\bm{x}}_{T}^{0}, (11)

where 𝒙T=(xT,hT)𝒳{\bm{x}}_{T}=(x_{T},h_{T})\in{\mathcal{X}} (resp., 𝒙T0=(xT0,hT0)𝒳{\bm{x}}_{T}^{0}=(x_{T}^{0},h_{T}^{0})\in{\mathcal{X}}) is the state (resp., initial state) and uT𝒰T:=[1,1]u_{T}\in{\mathcal{U}}_{T}:=[-1,1] is the control input of TT with xTx_{T} and hTh_{T} denoting her horizontal and vertical position, respectively. Also, 𝐃=[vT,0]{\mathbf{D}}=[v_{T},0]^{\top}, where vT>0v_{T}\in{\mathbb{R}}_{>0} denotes TT’s maximum speed. Note that, since we assume the motion of TT is constrained along the horizontal axis, it holds that hT(t)=hT0=0,t0h_{T}(t)=h_{T}^{0}=0,~{}\forall t\geq 0.

Refer to caption
Figure 1: Planar pursuit-evasion engagement geometry

2.2 The pursuit-evasion game

In the target interception problem we will discuss, PP’s goal is to intercept TT as soon as possible while TT’s goal is to delay the interception. Hence, this interception problem can naturally be formulated as a pursuit-evasion game [9]:

Problem 1 (Two-player pursuit-evasion game)
minuP()maxuT()\displaystyle\min_{u_{P}(\cdot)}\max_{u_{T}(\cdot)}{}{} J=tf\displaystyle J=t_{f}
s.t. 𝒙˙P=𝒇P(𝒙P,uP),𝒙P(0)=𝒙P0,\displaystyle\dot{{\bm{x}}}_{P}={\bm{f}}_{P}({\bm{x}}_{P},u_{P}),\quad{\bm{x}}_{P}(0)={\bm{x}}_{P}^{0},
𝒙˙T=𝐃uT,𝒙T(0)=𝒙T0,\displaystyle\dot{{\bm{x}}}_{T}={\mathbf{D}}u_{T},\qquad\qquad{\bm{x}}_{T}(0)={\bm{x}}_{T}^{0},
uP𝒰P,uT𝒰T,\displaystyle u_{P}\in{\mathcal{U}}_{P},~{}u_{T}\in{\mathcal{U}}_{T},
Ψ(𝒙P(tf),𝒙T(tf))=0,\displaystyle\Psi({\bm{x}}_{P}(t_{f}),{\bm{x}}_{T}(t_{f}))=0,

where Ψ(𝒙P,𝒙T):=𝒙T𝒑P2ϵ\Psi({\bm{x}}_{P},{\bm{x}}_{T}):=\|{\bm{x}}_{T}-\bm{p}_{P}\|_{2}-\epsilon with ϵ\epsilon denoting the capture radius. In words, given the system models (1) and (11) and the initial positions of PP and TT, we are interested in finding the feedback strategies uP()u_{P}(\cdot) and uT()u_{T}(\cdot) such that, if these strategies are applied to (1) and (11), TT will be intercepted by PP at time tf:=inf{t0:Ψ(𝒙P(tf),𝒙T(tf))=0}t_{f}:=\inf\{t\in{\mathbb{R}}_{\geq 0}:\Psi({\bm{x}}_{P}(t_{f}),{\bm{x}}_{T}(t_{f}))=0\} (time of capture). The planar geometry of pursuit-evasion between PP and TT is illustrated in Figure 1, where all the variables therein have been defined except for λ\lambda which denotes the angle of depression of PP.

To ensure the feasibility of the game in Problem 1, we make a few assumptions:

Assumption 1

(Perfect information game) Both PP and TT can observe their opponent’s state at every instance of time.

Assumption 2

(Game of kind) The initial states of both players are given such that capture is guaranteed under optimal play, i.e., minuP()maxuT()J<\min_{u_{P}^{\star}(\cdot)}\max_{u_{T}^{\star}(\cdot)}J<\infty.

The feedback strategies uP()u_{P}^{\star}(\cdot) and uT()u_{T}^{\star}(\cdot) constitute a (feedback) saddle point of the game must satisfy

J(uP(),uT())J(uP(),uT())J(uP(),uT()),J(u_{P}^{\star}(\cdot),u_{T}(\cdot))\leq J(u_{P}^{\star}(\cdot),u_{T}^{\star}(\cdot))\leq J(u_{P}(\cdot),u_{T}^{\star}(\cdot)), (12)

which implies that if PP plays optimally while TT plays non-optimally, PP can take advantage of TT’s non-optimal play and capture TT earlier than the saddle point time of capture. Conversely, if PP plays non-optimally while TT plays optimally, TT can delay (or even avoid) the termination of game. The game in Problem 1 always possesses a saddle point that satisfies (12) because the Hamiltonian of the game

=1+𝝀P𝒇P+𝝀T𝐃uT,\displaystyle{\mathcal{H}}=1+{\bm{\lambda}}_{P}^{\top}{\bm{f}}_{P}+{\bm{\lambda}}_{T}^{\top}{\mathbf{D}}u_{T}, (13)

where 𝝀P4{\bm{\lambda}}_{P}\in{\mathbb{R}}^{4} and 𝝀T2{\bm{\lambda}}_{T}\in{\mathbb{R}}^{2} denote co-states, satisfies the Isaacs’ condition [9]:

minuP()maxuT()=maxuT()minuP(),\displaystyle\min_{u_{P}(\cdot)}\max_{u_{T}(\cdot)}{\mathcal{H}}=\max_{u_{T}(\cdot)}\min_{u_{P}(\cdot)}{\mathcal{H}}, (14)

since (13) is separable in the control inputs uPu_{P} and uTu_{T}. The Value of the game, which then is assured to exist, is given by

V=minuP()maxuT()J=maxuT()minuP()J.\displaystyle V=\min_{u_{P}(\cdot)}\max_{u_{T}(\cdot)}J=\max_{u_{T}(\cdot)}\min_{u_{P}(\cdot)}J. (15)

3 Open-Loop Pursuit-Evasion Trajectories

In this section, we first derive the optimal strategy for TT in Problem 1 by imposing a few additional assumptions on the dynamical behaviors of PP and TT in order to obtain an approximated value function. With this strategy we simplify the game in Problem 1 into PP’s one-sided optimal control problem and compute his optimal trajectory.

3.1 Approximated saddle point strategy for optimal evasion

First, we assume that PP’s flight trajectory during the end game is a straight line, i.e., the flight path angle of PP aligns with its line of sight (γPλ\gamma_{P}\approx\lambda) and that the speed of PP linearly decreases with time. In consideration of these assumptions, the motion of PP can also be described by the simple motion kinematics (with decreasing maximum speed), and therefore the value function can be approximated as a linear function of the initial distance between the two players as follows:

V:=minuP()maxuT()J=c𝒙T𝒑P2,\displaystyle V:=\min_{u_{P}^{\star}(\cdot)}\max_{u_{T}^{\star}(\cdot)}J=c\|{\bm{x}}_{T}-\bm{p}_{P}\|_{2}, (16)

with c=(v¯PvT)1c=(\overline{v}_{P}-v_{T})^{-1} where v¯P\overline{v}_{P} denotes the average velocity of PP during the flight. The partial derivative of (16) with respect to 𝒙T{\bm{x}}_{T} is given by

V𝒙T=c(𝒙T𝒙¯P)𝒙T𝒙¯P2.\displaystyle\frac{\partial V}{\partial{\bm{x}}_{T}}=\frac{c({\bm{x}}_{T}-\overline{\bm{x}}_{P})^{\top}}{\|{\bm{x}}_{T}-\overline{\bm{x}}_{P}\|_{2}}. (17)

Substituting (17) into the Hamilton-Jacobi-Isaacs (HJI) equation [13] and using the fact that the value function VV is time-invariant, one can derive

0\displaystyle 0 =minuP𝒰PmaxuT𝒰T(1+V𝒙P𝒇P+V𝒙T𝐃uT)\displaystyle=\min_{u_{P}\in{\mathcal{U}}_{P}}\max_{u_{T}\in{\mathcal{U}}_{T}}\left(1+\frac{\partial V}{\partial{\bm{x}}_{P}}{\bm{f}}_{P}+\frac{\partial V}{\partial{\bm{x}}_{T}}{\mathbf{D}}u_{T}\right)
=1+minuP𝒰PV𝒙P𝒇P+maxuT𝒰T2cvT(xTxP)𝒙T𝒙¯P2uT,\displaystyle=1+\min_{u_{P}\in{\mathcal{U}}_{P}}\frac{\partial V}{\partial{\bm{x}}_{P}}{\bm{f}}_{P}+\max_{u_{T}\in{\mathcal{U}}_{T}}\frac{2cv_{T}(x_{T}-x_{P})^{\top}}{\|{\bm{x}}_{T}-\overline{\bm{x}}_{P}\|_{2}}u_{T},

from which we can conclude that the saddle point strategy for TT is given by

uT(𝒙P(t),𝒙T(t))\displaystyle u_{T}^{\star}({\bm{x}}_{P}(t),{\bm{x}}_{T}(t))
={sgn(xT(t)xP(t)),ifxT(t)xP(t),undefined,ifxT(t)=xP(t).\displaystyle\quad=\begin{cases}\operatorname{{\mathrm{sgn}}}(x_{T}(t)-x_{P}(t)),&\textrm{if}~{}x_{T}(t)\neq x_{P}(t),\\ \textrm{undefined},&\textrm{if}~{}x_{T}(t)=x_{P}(t).\end{cases} (18)

In other words, the initial relative position of PP with respect to TT in the xx coordinate determines the optimal heading direction of TT. We can further assume that, excluding the case when xT=xPx_{T}=x_{P}, no switching will occur during the game because, due to the assumption on the pursuit trajectory being a straight-line, the sign of relative xx position will never be switched. Thus, strategy (3.1) can also be considered an open-loop strategy for TT.

3.2 Optimal pursuit trajectory

Despite PP’s highly nonlinear equations of motion, given the fact that the optimal evasion trajectory of TT is a straight line with direction determined by the initial relative horizontal distance, the open-loop representation of the saddle point strategy of PP is no longer difficult to derive. Provided the initial positions of the two players, we first determine TT’s (open-loop) strategy by uT=uT(𝒙P0,𝒙T0)u_{T}^{*}=u_{T}^{\star}({\bm{x}}_{P}^{0},{\bm{x}}_{T}^{0}), which we call the Open-Loop Evasion Strategy (OLES). Since the evasion direction of TT is time-invariant and known a priori to PP, Problem 1 can be reduced to a one-sided optimal control problem for PP as follows:

Problem 2 (PP’s optimal control problem)
minuP()\displaystyle\min_{u_{P}(\cdot)}{}{} J=tf\displaystyle J=t_{f}
s.t. 𝒙˙P=𝒇P(𝒙P,uP),𝒙P(0)=𝒙P0,\displaystyle\dot{{\bm{x}}}_{P}={\bm{f}}_{P}({\bm{x}}_{P},u_{P}),\quad{\bm{x}}_{P}(0)={\bm{x}}_{P}^{0},
𝒙˙T=𝐃uT,𝒙T(0)=𝒙T0,\displaystyle\dot{{\bm{x}}}_{T}={\mathbf{D}}u_{T}^{*},\quad{\bm{x}}_{T}(0)={\bm{x}}_{T}^{0},
uP𝒰P,\displaystyle u_{P}\in{\mathcal{U}}_{P},
Ψ(𝒙P(tf),𝒙T(tf))=0.\displaystyle\Psi({\bm{x}}_{P}(t_{f}),{\bm{x}}_{T}(t_{f}))=0.

Problem 2 can be readily solved using any existing numerical optimal control technique (either direct or indirect). Details on the implementation of such techniques will be omitted herein due to space limitations, yet one can refer instead to standard references in the field (for instance, see [14] and references therein). Such numerical techniques will yield the (nominal) minimum time of capture tft_{f}^{*} as well as the optimal state and input trajectories of PP, denoted by 𝒙P{\bm{x}}_{P}^{*} and uPu_{P}^{*}, respectively. We will also refer to uPu_{P}^{*} as the Open-Loop Pursuit Strategy (OLPS). In addition, the state trajectories of PP and TT generated with the application of the OLPS and OLES, are together referred to as the Open-Loop Saddle Point State Trajectory (OLSPT).

4 Auxiliary Zero-Sum LQDG

The OLPS, albeit optimal in an open-loop sense, is in fact not effective against TT when the latter makes sub-optimal and / or decisions which are inconsistent with the OLES. This is because the OLPS cannot take into account the deviation in TT’s state. In other words, as opposed to our discussion on (12), PP cannot take advantage of TT’s sub-optimal play with the OLPS. To cope with the possible unexpected maneuvers of TT, some auxiliary feedback input is needed such that the pursuit trajectory can be continuously reshaped in response to the state deviation of TT.

Fortunately, in light of the fact that PP and TT have a large speed difference, we may further assume that any resulting pursuit-evasion trajectory caused by TT’s evading maneuvers (whether optimal or not) will not significantly differ from the OLSPT. This particular assumption will allow us to linearize the dynamics of the players around the OLPTS and construct an auxiliary differential game in the neighborhood of this trajectory. Thereafter, we can derive an auxiliary feedback strategy for PP and combine it with the OLPS.

4.1 Linear approximation of game dynamics

To obtain an approximate linear state space model for the equations of motion of PP given in (1) which is valid in the neighborhood of the OLSPT, we can take the first order Taylor series expansion of the vector field 𝒇P{\bm{f}}_{P} and substitute the solution of Problem 2 (namely, tft_{f}^{*}, 𝒙P{\bm{x}}_{P}^{*}, and uPu_{P}^{*}) into the resulting linearized equations. By doing so, we obtain the following linear time-varying (LTV) system model for PP:

δ𝒙˙P\displaystyle\delta\dot{\bm{x}}_{P} =𝐀(t)δ𝒙P+𝐁(t)δuP,δ𝒙P(0)=δ𝒙P0,\displaystyle={\mathbf{A}}(t)\delta{\bm{x}}_{P}+{\mathbf{B}}(t)\delta u_{P},\quad\delta{\bm{x}}_{P}(0)=\delta{\bm{x}}_{P}^{0}, (19)

where δ𝒙P=𝒙P𝒙P\delta{\bm{x}}_{P}={\bm{x}}_{P}-{\bm{x}}_{P}^{*}, δ𝒙P0=𝒙P0𝒙P0\delta{\bm{x}}_{P}^{0}={\bm{x}}_{P}^{0}-{\bm{x}}_{P}^{0*}, and δuP=uPuP\delta u_{P}=u_{P}-u_{P}^{*} are the deviation of current state and input of PP from the open-loop ones, respectively. Moreover,

𝐀(t)=𝒇P𝒙P|𝒙P(t),uP(t)=\displaystyle{\mathbf{A}}(t)=\frac{\partial{\bm{f}}_{P}}{\partial{\bm{x}}_{P}}\bigg{|}_{{\bm{x}}_{P}^{*}(t),u_{P}^{*}(t)}=
[00cosγPvPsinγP00sinγPvPcosγP0κHvP2CD2κvPCDgcosγP0κHvPCLκCL+gcosγPvP2gsinγPvP],\displaystyle\begin{bmatrix}0&0&\cos\gamma_{P}&-v_{P}\sin\gamma_{P}\\ 0&0&\sin\gamma_{P}&v_{P}\cos\gamma_{P}\\ 0&\frac{\kappa}{H}v_{P}^{2}C_{D}&-2\kappa v_{P}C_{D}&-g\cos\gamma_{P}\\ 0&-\frac{\kappa}{H}v_{P}C_{L}&\kappa C_{L}+\frac{g\cos\gamma_{P}}{v_{P}^{2}}&\frac{g\sin\gamma_{P}}{v_{P}}\end{bmatrix}, (20)
𝐁(t)=𝒇PuP|𝒙P(t),uP(t)=[002κCD,2vP2ακCL,1vP],\displaystyle{\mathbf{B}}(t)=\frac{\partial{\bm{f}}_{P}}{\partial u_{P}}\bigg{|}_{{\bm{x}}_{P}^{*}(t),u_{P}^{*}(t)}=\begin{bmatrix}0\\ 0\\ -2\kappa C_{D,2}v_{P}^{2}\alpha\\ \kappa C_{L,1}v_{P}\end{bmatrix}, (21)

where κ:=Sρ/(2m)\kappa:=S\rho/(2m). Note that the matrices 𝐀(t){\mathbf{A}}(t) and 𝐁(t){\mathbf{B}}(t) are only defined over the finite time interval [0,tf][0,t_{f}^{*}].

Similarly, TT’s motion near her optimal evasion trajectory can be derived (in an almost trivial way) as

δ𝒙˙T\displaystyle\delta\dot{\bm{x}}_{T} =𝐃δuT,δ𝒙T(0)=δ𝒙T0,\displaystyle={\mathbf{D}}\delta u_{T},\quad\delta{\bm{x}}_{T}(0)=\delta{\bm{x}}_{T}^{0}, (22)

which is in the same form as (11).

4.2 Auxiliary differential game

Now, given the linear system models (19) and (22), an auxiliary differential game can be constructed around the OLSPT. The corresponding game dynamics are written as

𝐱˙\displaystyle\dot{{\mathbf{x}}} =𝓐(t)𝐱+𝓑(t)νP+𝓓νT,𝐱(0)=𝐱0,\displaystyle=\bm{{\mathcal{A}}}(t){\mathbf{x}}+\bm{{\mathcal{B}}}(t)\nu_{P}+\bm{{\mathcal{D}}}\nu_{T},\quad{\mathbf{x}}(0)={\mathbf{x}}_{0}, (23)

where 𝐱=(δ𝒙T,δ𝒙P){\mathbf{x}}=(\delta{\bm{x}}_{T},\delta{\bm{x}}_{P}) is the joint state deviation, 𝐱0=(δ𝒙T0,δ𝒙P0){\mathbf{x}}_{0}=(\delta{\bm{x}}_{T}^{0},\delta{\bm{x}}_{P}^{0}) is the initial joint state deviation, νP=δuP\nu_{P}=\delta u_{P} and νT=δuT\nu_{T}=\delta u_{T}, and

𝓐(t)\displaystyle\bm{{\mathcal{A}}}(t) =[𝟎2×2𝟎2×4𝟎4×2𝐀(t)],\displaystyle=\left[\begin{array}[]{c|c}\bm{0}_{2\times 2}&\bm{0}_{2\times 4}\\ \hline\cr\bm{0}_{4\times 2}&{\mathbf{A}}(t)\end{array}\right], (26)
𝓑(t)\displaystyle\bm{{\mathcal{B}}}(t) =[𝟎2×1𝐁(t)],\displaystyle=\begin{bmatrix}\bm{0}_{2\times 1}\\ {\mathbf{B}}(t)\end{bmatrix}, (27)
𝓓\displaystyle\bm{{\mathcal{D}}} =[𝐃𝟎4×1].\displaystyle=\begin{bmatrix}{\mathbf{D}}\\ \bm{0}_{4\times 1}\end{bmatrix}. (28)

Note again that these matrices are also defined over the finite interval [0,tf][0,t_{f}^{*}] only. Furthermore, strategies νP\nu_{P} and νT\nu_{T} are not bounded in this formulation. With this LTV game dynamics, we construct a two-player zero-sum LQDG:

Problem 3 (Auxiliary two-player zero-sum LQDG)
minνP()maxνT()\displaystyle\min_{\nu_{P}(\cdot)}\max_{\nu_{T}(\cdot)}{}{} 𝒥=w1[(δxTδxP)2+w2(δhTδhP)2]\displaystyle{\mathcal{J}}=w_{1}\left[\left(\delta x_{T}-\delta x_{P}\right)^{2}+w_{2}\left(\delta h_{T}-\delta h_{P}\right)^{2}\right]
+0tf[νP2w3νT2(t)]dt\displaystyle\qquad\qquad\qquad+\int_{0}^{t_{f}^{*}}\left[\nu_{P}^{2}-w_{3}\nu_{T}^{2}(t)\right]{\mathrm{d}}t
s.t. 𝐱˙=𝓐(t)𝐱+𝓑(t)νP+𝓓νT,𝐱(0)=𝐱0,\displaystyle\dot{\mathbf{x}}=\bm{{\mathcal{A}}}(t){\mathbf{x}}+\bm{{\mathcal{B}}}(t)\nu_{P}+\bm{{\mathcal{D}}}\nu_{T},~{}~{}{\mathbf{x}}(0)={\mathbf{x}}_{0},

where the auxiliary payoff functional 𝒥{\mathcal{J}} is the weighted combination of the soft terminal constraint on miss distance and the accumulated control inputs. The weight coefficients w1w_{1}, w2w_{2}, and w3w_{3} are positive constants which can be tuned via trial and error. The fixed final time of the game corresponds to the nominal time of capture tft_{f}^{*} which we have already found in Problem 2. Let us additionally define a matrix

𝐐\displaystyle{\mathbf{Q}} =w1[10100w20w210100w20w2𝟎4×2𝟎2×4𝟎2×2],\displaystyle=w_{1}\left[\begin{array}[]{c|c}\begin{matrix}1&0&-1&0\\ 0&w_{2}&0&-w_{2}\\ -1&0&1&0\\ 0&-w_{2}&0&w_{2}\end{matrix}&\bm{0}_{4\times 2}\\ \hline\cr\bm{0}_{2\times 4}&\bm{0}_{2\times 2}\end{array}\right], (31)

then the auxiliary payoff functional can equivalently be written as

𝒥=𝐱(tf)𝐐𝐱(tf)+0tf[νP2w3νT2(t)]dt.\displaystyle{\mathcal{J}}={\mathbf{x}}^{\top}(t_{f}^{*}){\mathbf{Q}}{\mathbf{x}}(t_{f}^{*})+\int_{0}^{t_{f}^{*}}\left[\nu_{P}^{2}-w_{3}\nu_{T}^{2}(t)\right]{\mathrm{d}}t. (32)

The feedback saddle point solution of Problem 3, namely νP\nu_{P}^{\star} and νT\nu_{T}^{\star} which satisfy a similar inequality as (12):

𝒥(νP(),νT())𝒥(νP(),νT())𝒥(νP(),νT())\displaystyle{\mathcal{J}}(\nu_{P}^{\star}(\cdot),\nu_{T}(\cdot))\leq{\mathcal{J}}(\nu_{P}^{\star}(\cdot),\nu_{T}^{\star}(\cdot))\leq{\mathcal{J}}(\nu_{P}(\cdot),\nu_{T}^{\star}(\cdot))

can readily be derived as [15]

νP(𝐱(t),t)\displaystyle\nu_{P}^{\star}({\mathbf{x}}(t),t) =𝓑(t)𝐏(t)𝐱(t),\displaystyle=-\bm{{\mathcal{B}}}^{\top}(t){\mathbf{P}}(t){\mathbf{x}}(t), (33)
νT(𝐱(t),t)\displaystyle\nu_{T}^{\star}({\mathbf{x}}(t),t) =w31𝓓𝐏(t)𝐱(t),\displaystyle=w_{3}^{-1}\bm{{\mathcal{D}}}^{\top}{\mathbf{P}}(t){\mathbf{x}}(t), (34)

where 𝐏{\mathbf{P}} corresponds to the solution of the following Matrix Riccati Differential Equation (MRDE):

𝐏˙=𝓐(t)𝐏+𝐏𝓐(t)𝐏𝓢(t)𝐏,𝐏(tf)=𝐐,\displaystyle-\dot{\mathbf{P}}=\bm{{\mathcal{A}}}^{\top}(t){\mathbf{P}}+{\mathbf{P}}\bm{{\mathcal{A}}}^{\top}(t)-{\mathbf{P}}\bm{\mathcal{{S}}}(t){\mathbf{P}},~{}~{}{\mathbf{P}}(t_{f}^{*})={\mathbf{Q}}, (35)

with

𝓢(t):=𝓑(t)𝓑(t)w31𝓓𝓓.\displaystyle\bm{\mathcal{{S}}}(t):=\bm{{\mathcal{B}}}(t)\bm{{\mathcal{B}}}^{\top}(t)-w_{3}^{-1}\bm{{\mathcal{D}}}\bm{{\mathcal{D}}}^{\top}. (36)

By solving the MRDE backward in time one can compute the feedback gains in (33) and (34). Finally, to ensure that the aggregate feedback strategies meet the input constraints defined in Section 1, we define a function ϕi:𝒰i\phi_{i}:{\mathbb{R}}\rightarrow{\mathcal{U}}_{i} for i{P,T}i\in\{P,T\} such that

ϕi(u):={u,ifu𝒰i,sup𝒰i,ifusup𝒰i,inf𝒰i,ifuinf𝒰i,\displaystyle\phi_{i}(u):=\begin{cases}u,&\textrm{if}~{}u\in{\mathcal{U}}_{i},\\ \sup~{}{\mathcal{U}}_{i},&\textrm{if}~{}u\geq\sup~{}{\mathcal{U}}_{i},\\ \inf~{}{\mathcal{U}}_{i},&\textrm{if}~{}u\leq\inf~{}{\mathcal{U}}_{i},\end{cases} (37)

where uu\in{\mathbb{R}}. Hence, the (bounded) aggregate feedback strategies are given by

u~P(𝐱(t),t)\displaystyle\widetilde{u}_{P}^{\star}({\mathbf{x}}(t),t) =ϕP(uP(t)+νP(𝐱(t),t)),\displaystyle=\phi_{P}(u_{P}^{*}(t)+\nu_{P}^{\star}({\mathbf{x}}(t),t)), (38)
u~T(𝐱(t),t)\displaystyle\widetilde{u}_{T}^{\star}({\mathbf{x}}(t),t) =ϕT(uT(t)+νT(𝐱(t),t)).\displaystyle=\phi_{T}(u_{T}^{*}(t)+\nu_{T}^{\star}({\mathbf{x}}(t),t)). (39)
Table 1: Hypersonic vehicle parameters
Parameter Symbol Value (units)
Reference area SS 0.2919 m2m^{2}
Mass mm 340.1943 kgkg
Lift coefficient CL,1C_{L,1} 1.5658
Drag coefficients CD,0C_{D,0}, CD,2C_{D,2} 0.0612, 1.6537
Table 2: Initial state variables of players
Variable Initial value (units)
xP0x_{P}^{0} -50,000 mm
hP0h_{P}^{0} 20,000 mm
vP0v_{P}^{0} 4,000 mm
γP0\gamma_{P}^{0} -0.4 radrad
xT0x_{T}^{0} 0 mm
hT0h_{T}^{0} 0 mm
Table 3: Different evasion strategies and miss distance
Evasion strategy Miss distance (units)
E1E_{1} 13.1660 mm
E2E_{2} 16.8271 mm
E3E_{3} 3.1349 mm

5 Numerical Simulations

In this section, we present numerical simulation results of the proposed game-theoretic terminal guidance law. The vehicle parameters of PP are specified in Table 1, whereas the initial conditions of PP and TT are listed in Table 2. Other parameters that are necessary for the simulations are selected as: surface air density ρ0=1.2kg/m3\rho_{0}=1.2~{}kg/m^{3}, scale height H=7,500mH=7,500~{}m, and maximum target speed vT=20m/sv_{T}=20~{}m/s. The angle of attack, α\alpha, takes values in the compact set 𝒰P:=[π/18,π/18]\mathcal{U}_{P}:=[-\pi/\mathrm{18},\pi/\mathrm{18}]. The capture radius ϵ\epsilon is assumed to be zero, which means that the game terminates if the positions of PP and TT coincide.

Refer to caption
Figure 2: Pursuit-evasion trajectories
Refer to caption
Figure 3: Input trajectories (angle of attack)

We first solve Problem 2 by utilizing a numerical optimal control technique (for instance, direct collocation [14]) to obtain the OLPS. Note that one can use any numerical method here (either direct or indirect), and in fact the OLPS and the corresponding reference trajectory do not have to be optimal (i.e., they may be sub-optimal) in order to apply the aggregate guidance laws proposed in Section 4; however, the closer this reference trajectory is to the optimal one, the smaller tft_{f}^{*} one can obtain. From Table 2 we see that TT is initially located to the right of PP (i.e., xP0<xT0x_{P}^{0}<x_{T}^{0}), which implies that the OLES, according to (3.1), is given by uT=1u_{T}^{*}=1. In the following simulation results, we refer to PP’s optimal trajectory, 𝒙P{\bm{x}}_{P}^{*}, as the Time Optimal (TO) trajectory, which is illustrated as a dash line in Figure 2. In addition, the nominal time of capture is obtained from the numerical solver as tf=16.1625(s)t_{f}^{*}=16.1625(s).

Refer to caption
Figure 4: State trajectories of the hypersonic vehicle

Given the TO trajectory and the nominal time of capture, we can compute the time-varying matrices 𝐀(t){\mathbf{A}}(t) and 𝐁(t){\mathbf{B}}(t) as well as 𝓐(t)\bm{{\mathcal{A}}}(t) and 𝓑(t)\bm{{\mathcal{B}}}(t) prior to the commencement of game, all of which are defined over the finite time horizon [0,tf][0,t_{f}^{*}]. Given these matrices, the MRDE (35) can be solved numerically (backward as said) to compute the feedback gains in (33) and (34), also prior to the commencement of game. Proper selection of the weighted coefficients, w1w_{1}, w2w_{2}, and w3w_{3}, is the most critical process of this guidance algorithm, which must be done via trial and error. The tuning procedures we have adopted are as follows; first, we find a sufficiently smaller value of w1w_{1} from its current value such that the auxiliary control input νP\nu_{P} does not override the OLPS uPu_{P}^{*}, in which case PP can easily stall. Once this is done, the final position of PP may be located at the desired xx position but not at the proper hh position (non-zero altitude). For this reason, we next increase the value of w2w_{2} such that δhP0\delta h_{P}\rightarrow 0. Our final choice for the weighted coefficients are as follows: w1=3105w_{1}=3\cdot 10^{-5}, w2=103w_{2}=10^{3}, and w3=103w_{3}=10^{3}.

In order to investigate the performance of our guidance law, namely (38), against unpredictable evasion of TT, we apply it against TT who employs one of the following three evasion strategies: The first strategy, named E1E1, is the optimal evasion, i.e., uT(t)=uTu_{T}(t)=u_{T}^{*} for all t[0,tf]t\in[0,t_{f}^{*}]. In this case, we expect that PP will simply follow (or track) its reference trajectory since the deviation of TT’s horizontal position (δxT\delta x_{T}) will always be zero. The second strategy, E2E2, is evasion in the opposite direction of E1E1, i.e., uT(t)=uTu_{T}(t)=-u_{T}^{*} for all t[0,tf]t\in[0,t_{f}^{*}]. From the game-theoretic point of view, this strategy would be considered the worst strategy for TT since it would yield the smallest time of capture, only if PP knew its feedback saddle point strategy. Since PP employs, however, the proposed guidance law which is not necessarily optimal in Problem 1, E2E2 can be used to test the robustness of the latter guidance law. Lastly, the third strategy, named E3E3, is random evasion which is meant to confuse PP by changing the evasion direction randomly; in particular, TT periodically chooses her control input to be a random sample from the (discrete) uniform distribution over the set {1,0,1}\{-1,0,1\}.

In Figure 2, the trajectories of TT corresponding to E1E1, E2E2, and E3E3 are shown as red curves, whereas the trajectories of PP in response to these strategies are shown as black curves with different shapes of markers. The miss distances are listed in Table 3, all of which turn out to be reasonably small. Since our terminal guidance law is prescribed with a fixed final time, in all three cases PP reaches his final position at tft_{f}^{*}. It is interesting to observe that PP’s trajectory against E1E1 is, as opposed to our earlier expectation, is notably different from the TO trajectory. This is presumably because PP employs the OLPS, originally a smooth function obtained from the numerical solver, as a piece-wise continuous function in the numerical simulations. Despite the numerical error that comes from such discretization, however, it is shown that PP is able to intercept TT eventually with a reasonably small miss distance. Finally, the input and state trajectories of PP are shown in Figures 3 and 4, respectively.

6 Conclusions

In this paper, we have proposed a terminal feedback guidance law that is based on a linearized approximation of the nonlinear hypersonic dynamics and the saddle point solution to a two-player zero-sum linear quadratic differential game. In the proposed guidance algorithm, given that the target’s optimal evasion is confined in one axis and under a few simplifying assumptions, an open-loop reference pursuit trajectory have been computed by solving an optimal hypersonic trajectory optimization problem. The latter trajectory has then been used to linearize the nonlinear game dynamics and construct an auxiliary linear quadratic differential game whose saddle point solution is combined with the open-loop input. We have also presented numerical simulation results to verify that, as long as the weighted coefficients are properly selected via trial and error, our proposed guidance law performs well and guarantees the hypersonic pursuit vehicle to intercept the maneuvering (evading) target in a finite time with reasonably small miss distances. One of the limitations of the proposed scheme is that it is only applicable to a scenario in which the target’s motion is constrained along a line. It is also required that the target’s position must be known to the hypersonic pursuit vehicle at every time instant in order to compute the feedback control input derived from the auxiliary differential game along a reference pursuit trajectory. Possible future directions of this work include consideration of scenarios in which the target may evade on a plane and the information given to the hypersonic pursuit vehicle on the target’s position is stochastic rather than deterministic.

References

  • [1] D. A. Benson, G. T. Huntington, T. P. Thorvaldsen, and A. V. Rao, “Direct trajectory optimization and costate estimation via an orthogonal collocation method,” Journal of Guidance, Control, and Dynamics, vol. 29, no. 6, pp. 1435–1440, 2006.
  • [2] M. Vedantam, M. R. Akella, and M. J. Grant, “Multi-stage stabilized continuation for indirect optimal control of hypersonic trajectories,” in AIAA Scitech 2020 Forum, 2020, p. 0472.
  • [3] A. E. Bryson and Y.-C. Ho, Applied optimal control: optimization, estimation, and control.   Routledge, 2018.
  • [4] G. R. Eisler and D. G. Hull, “Guidance law for hypersonic descent to a point,” Journal of Guidance, Control, and Dynamics, vol. 17, no. 4, pp. 649–654, 1994.
  • [5] J. Bain and J. Speyer, “Robust neighboring extremal guidance for the advanced launch system,” in Guidance, Navigation and Control Conference, 1993, p. 3750.
  • [6] J. Carson, M. S. Epstein, D. G. MacMynowski, and R. M. Murray, “Optimal nonlinear guidance with inner-loop feedback for hypersonic re-entry,” in 2006 American Control Conference, 2006, pp. 6–pp.
  • [7] V. R. Makkapati, J. Ridderhof, P. Tsiotras, J. Hart, and B. van Bloemen Waanders, “Desensitized trajectory optimization for hypersonic vehicles,” in 2021 IEEE Aerospace Conference (50100), 2021, pp. 1–10.
  • [8] Y. Shi and Z. Wang, “Onboard generation of optimal trajectories for hypersonic vehicles using deep learning,” Journal of Spacecraft and Rockets, vol. 58, no. 2, pp. 400–414, 2021.
  • [9] R. Isaacs, Differential Games: A Mathematical Theory with Applications to Warfare and Pursuit, Control and Optimization.   New York, NY: Wiley, 1965.
  • [10] V. Turetsky and J. Shinar, “Missile guidance laws based on pursuit–evasion game formulations,” Automatica, vol. 39, no. 4, pp. 607–618, 2003.
  • [11] S. Battistini and T. Shima, “Differential games missile guidance with bearings-only measurements,” IEEE Transactions on Aerospace and Electronic Systems, vol. 50, no. 4, pp. 2906–2915, 2014.
  • [12] K. Horie and B. A. Conway, “Optimal fighter pursuit-evasion maneuvers found via two-sided optimization,” Journal of Guidance, Control, and Dynamics, vol. 29, no. 1, pp. 105–112, 2006.
  • [13] T. Başar and G. J. Olsder, Dynamic noncooperative game theory.   SIAM, 1998.
  • [14] M. Kelly, “An introduction to trajectory optimization: How to do your own direct collocation,” SIAM Review, vol. 59, no. 4, pp. 849–904, 2017.
  • [15] D. Li and J. B. Cruz, “Defending an asset: a linear quadratic game approach,” IEEE Transactions on Aerospace and Electronic Systems, vol. 47, no. 2, pp. 1026–1044, 2011.
\thebiography
{biographywithpic}

Yoonjae LeeFigures/Lee.jpg received the B.S. degree in aerospace engineering from the University of California, San Diego, CA, USA in 2020. He is currently enrolled as a graduate student studying aerospace engineering in the Department of Aerospace Engineering and Engineering Mechanics at the University of Texas at Austin. His research is mainly focused on game theory, multi-agent systems, and pursuit-evasion games.

{biographywithpic}

Efstathios Bakolas (Member, IEEE)Figures/Bakolas.jpg received the Diploma degree in mechanical engineering with highest honors from the National Technical University of Athens, Athens, Greece, in 2004 and the M.S. and Ph.D. degrees in aerospace engineering from the Georgia Institute of Technology, Atlanta, Atlanta, GA, USA, in 2007 and 2011, respectively. He is currently an Associate Professor with the Department of Aerospace Engineering and Engineering Mechanics, The University of Texas at Austin, Austin, TX, USA. His research is mainly focused on stochastic optimal control theory, optimal decision-making, differential and dynamic games, control of uncertain systems, data-driven modeling and control of nonlinear systems, and control of aerospace systems.

{biographywithpic}

Maruthi R. AkellaFigures/Akella.jpeg is a tenured faculty member with the Department of Aerospace Engineering and Engineering Mechanics at The University of Texas at Austin (UT Austin) where he holds the Ashley H. Priddy Centennial Professorship in Engineering. He is the founding director for the Center for Autonomous Air Mobility and the faculty lead for the Control, Autonomy, and Robotics area at UT Austin. His research program encompasses control theoretic investigations of nonlinear and coordinated systems, vision-based sensing for state estimation, and development of integrated human and autonomous multivehicle systems. He is Editor-in-Chief for the Journal of the Astronautical Sciences and previously served as the Technical Editor (Space Systems) for the IEEE Transactions on Aerospace and Electronic Systems. He is a Fellow of the IEEE and AAS.