This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Rationally Inattentive Path-Planning via RRT*

Jeb Stefan1, Ali Reza Pedram2, Riku Funada3 and Takashi Tanaka3 *This work is supported by Lockheed Martin Corporation.1Odyssey Space Research. jeb.stefan@odysseysr.com. 2Walker Department of Mechanical Engineering, University of Texas at Austin. apedram@utexas.edu. 3Department of Aerospace Engineering and Engineering Mechanics, University of Texas at Austin. riku.funada@austin.utexas.edu and ttanaka@utexas.edu.
Abstract

We consider a path-planning scenario for a mobile robot traveling in a configuration space with obstacles under the presence of stochastic disturbances. A novel path length metric is proposed on the uncertain configuration space and then integrated with the existing RRT* algorithm. The metric is a weighted sum of two terms which capture both the Euclidean distance traveled by the robot and the perception cost, i.e., the amount of information the robot must perceive about the environment to follow the path safely. The continuity of the path length function with respect to the topology of the total variation metric is shown and the optimality of the Rationally Inattentive RRT* algorithm is discussed. Three numerical studies are presented which display the utility of the new algorithm.

I Introduction

As robots are designed to be more self-reliant in navigating complex and stochastic environments, it is sensible for the strategic execution of perception/cognition tasks to be included in the theory which governs their path-planning [1, 2, 3]. Even though the body of work surrounding motion planning techniques has greatly expanded recently, a technological gap remains in the integration of perception concerns into planning tasks [4]. Mitigating this gap is paramount to missions which require robots to autonomously complete tasks when sensing actions carry high costs (battery power, computing constraints, etc).

Path-planning is typically followed by feedback control design, which is executed during the path following phase. In the current practice, path-planning and path-following are usually discussed separately (notable exceptions include [5, 6, 7]), and the cost of feedback control (perception cost in particular) is not incorporated in the path-planning phase. The first objective of this work is to fill this gap by introducing a novel path cost function which incorporates the expected perception cost accrued during path-following into the planning phase. This cost jointly penalizes the amount of sensing needed to follow a path and the distance traveled. Our approach is closely related to the concept of rationally inattentive (RI) control [8] (topic from macroeconomics which has recently been applied in control theory [9, 10]). The aim of rationally inattentive control is to jointly design the control and sensing policies such that the least amount of information (measured in bits) is collected about the environment in order to achieve the desired control.

The second objective of this work is to integrate the proposed path length function with an existing sampling-based algorithm, such as Rapidly-Exploring Random Trees (RRT) [11]. The RRT algorithm is suited for this problem as it has been shown to find feasible paths in motion planning problems quickly. A modified version of this algorithm, RRT* [12], will be utilized as it has the additional property of being asymptotically optimal. We develop an RRT*-like algorithm incorporating the proposed path length function (called the RI-RRT* algorithm) and demonstrate its effectiveness.

While the practical utility of the proposed framework must be thoroughly studied in the future, its expected impact is displayed in Fig. 1. This figure shows the example of a robot moving through the two-dimensional, obstacle-filled environment. Path A (red) represents the path from the origin to target location which minimizes the Euclidean distance. However, this path requires a large number of sensor actuations to keep the robot’s spatial uncertainty small and avoid colliding with obstacles. Alternatively, Path B (blue) allows for the covariance to safely grow more along the path. Although the Path B travels a greater Euclidean distance to reach the target, it is cheaper in the information-theoretic sense as it requires fewer sensing actions. Therefore, if the perception cost is weighed more than the travel cost, Path B is characterized as the shortest path in the proposed path planning framework. We will demonstrate this effect in a numerical simulation in Section V-C.

The proposed concept of rationally inattentive path-planning provides insight into the mathematical modeling of human experts’ skills in path planning [13], especially in terms of an efficiency-simplicity trade-off. Several path-planning algorithms have been proposed in the literature that are capable of enhancing path simplicity; this list includes potential field approaches [14], multi-resolution perception and path-planning [15, 16], and safe path-planning [17, 18]. The information-theoretic distance function we introduce in this paper can be thought of as an alternative measure of path simplicity, which may provide a suitable modeling of the human intuition for simplicity in planning. In our standard, a path which requires less sensor information during the path-following phase is more “simple;” Path B in Fig. 1 is simpler than Path A, and the simplest path is that which is traceable by an open-loop control policy.

The contributions of this paper are summarized as follows:

  • A novel path cost (RI cost) is formulated which jointly accounts for travel distance and perception cost.

  • The continuity of the path cost with respect to the topology of the total variation metric is shown in the single dimensional case, which is a step forward to guaranteeing the asymptotic optimality of sampling-based algorithms.

  • An RRT*-like algorithm is produced implementing the RI path-planning concept.

Refer to caption

Figure 1: Example of an autonomous robot navigating a two-dimensional configuration space with obstacles. The goal of the robot is to reach the target location. As it moves, the uncertainty of the robot’s exact location in the environment grows, represented by the varying sized covariance ellipses.

Notation: For the purpose of this work, the following definitions for vectors (lower case) and matrices (upper case) hold: 𝕊d={Pd×d:P is symmetric.},𝕊++d={P𝕊d:P0}\mathbb{S}^{d}=\left\{P\in\mathbb{R}^{d\times d}:P\text{ is symmetric.}\right\},\mathbb{S}_{++}^{d}=\left\{P\in\mathbb{S}^{d}:P\succ 0\right\}, 𝕊ϵd={P𝕊d:PϵI}\mathbb{S}_{\epsilon}^{d}=\left\{P\in\mathbb{S}^{d}:P\succeq\epsilon I\right\}, and bold symbols such as 𝒙\bm{x} represent random variables. The vector 2-norm is \|\cdot\| and F\|\cdot\|_{F} is Frobenius norm. The maximum singular values of a matrix MM is denoted by σ¯(M)\bar{\sigma}(M).

II Preliminary Material

In this paper, we consider a path-planning problem for a mobile robot with dynamics given by model (1). Let 𝒙(t)\bm{x}(t) be a d\mathbb{R}^{d}-valued random process representing the robot’s position at time tt, given by the controlled Ito process:

d𝒙(t)=𝒗(t)dt+W12d𝒃(t),\begin{split}&d\bm{x}(t)=\bm{v}(t)dt+W^{\frac{1}{2}}d\bm{b}(t),\end{split} (1)

with 𝒙(0)𝒩(x0,P0)\bm{x}(0)\sim\mathcal{N}(x_{0},P_{0}) and t[0,T]t\in[0,T]. Here, 𝒗(t)\bm{v}(t) is the velocity input command, 𝒃(t)\bm{b}(t) is the dd-dimensional standard Brownian motion, and WW is a given positive definite matrix used in modeling the process noise intensity. We assume that the robot is commanded to travel at a unit velocity (i.e., 𝒗(t)=1\|\bm{v}(t)\|=1). Let 𝒫=(0=t0<t1<<tN=T)\mathcal{P}=(0=t_{0}<t_{1}<\cdots<t_{N}=T) be a partition of [0,T][0,T], which must not necessarily be of equal spacing. Time discretization of (1) based on the Euler-Maruyama method [19] yields:

𝒙(tk+1)=𝒙(tk)+𝒗(tk)Δtk+𝒏(tk),\bm{x}(t_{k+1})=\bm{x}(t_{k})+\bm{v}(t_{k})\Delta t_{k}+\bm{n}(t_{k}), (2)

where Δtk=tk+1tk\Delta t_{k}=t_{k+1}-t_{k} and 𝒏(tk)𝒩(0,ΔtkW)\bm{n}(t_{k})\sim\mathcal{N}(0,\Delta t_{k}W). Introducing a new control input 𝒖(tk):=𝒗(tk)Δtk\bm{u}(t_{k}):=\bm{v}(t_{k})\Delta t_{k} and applying the constraint 𝒗(tk)=1\|\bm{v}(t_{k})\|=1, (2) can be written as:

𝒙(tk+1)=𝒙(tk)+𝒖(tk)+𝒏(tk),\begin{split}&\bm{x}(t_{k+1})=\bm{x}(t_{k})+\bm{u}(t_{k})+\bm{n}(t_{k}),\end{split} (3)

with 𝒏(tk)𝒩(0,𝒖(tk)W)\bm{n}(t_{k})\sim\mathcal{N}(0,\|\bm{u}(t_{k})\|W). Due to the unit velocity assumption above, the time intervals Δtk,k=0,1,2,\Delta t_{k},k=0,1,2,\cdots are determined once the command sequence 𝒖(t0),𝒖(t1),𝒖(t2),\bm{u}({t_{0}}),\bm{u}(t_{1}),\bm{u}(t_{2}),\cdots is formalized. Since the physical times tkt_{k} do not play significant roles in our theoretical development in the sequel, it is convenient to rewrite (3) as the main dynamics model of this work:

𝒙k+1=𝒙k+𝒖k+𝒏k,𝒏k𝒩(0,𝒖kW).\begin{split}&\bm{x}_{k+1}=\bm{x}_{k}+\bm{u}_{k}+\bm{n}_{k},\;\bm{n}_{k}\sim\mathcal{N}(0,\|\bm{u}_{k}\|W).\end{split} (4)

Let the probability distributions of the robot position at a given time step kk be parametrized by a Gaussian model 𝒙k𝒩(xk,Pk)\bm{x}_{k}\sim\mathcal{N}(x_{k},P_{k}), where xkdx_{k}\in\mathbb{R}^{d} is the nominal position and Pk𝕊++dP_{k}\in\mathbb{S}_{++}^{d} is the associated covariance matrix (with dd being the dimension of the configuration space). In this paper, we consider a path-planning framework in which the sequence {(xk,Pk)}k\{(x_{k},P_{k})\}_{k\in\mathbb{N}} is scheduled. Following [20, 21], the product space d×𝕊++d\mathbb{R}^{d}\times\mathbb{S}_{++}^{d} is called the uncertain configuration space. In what follows, the problem of finding the shortest path in the uncertain configuration space with respect to a novel information-theoretic path length function is formulated.

First, an appropriate directed distance function from a point (xk,Pk)d×𝕊++d(x_{k},P_{k})\in\mathbb{R}^{d}\times\mathbb{S}_{++}^{d} to another (xk+1,Pk+1)d×𝕊++d(x_{k+1},P_{k+1})\in\mathbb{R}^{d}\times\mathbb{S}_{++}^{d} is introduced. This function is interpreted as the cost of steering the random state variable 𝒙k𝒩(xk,Pk)\bm{x}_{k}\sim\mathcal{N}(x_{k},P_{k}) to 𝒙k+1𝒩(xk+1,Pk+1)\bm{x}_{k+1}\sim\mathcal{N}(x_{k+1},P_{k+1}) in the next time step under the dynamics provided by (4). In order to implement the rational inattention concept, we formulate this cost as a weighted sum of the control cost 𝒟cont(k)\mathcal{D}_{\text{cont}}(k) and the information cost 𝒟info(k)\mathcal{D}_{\text{info}}(k) in achieving each state transition.

II-A Control Cost

The control cost is simply the commanded travel distance in the Euclidean metric:

𝒟cont(k):=xk+1xk.\mathcal{D}_{\text{cont}}(k):=\|x_{k+1}-x_{k}\|. (5)

II-B Information Cost

Jointly accounting for both the control efficiency and sensing simplicity in planning necessitates the formulation of a metric that captures the information acquisition cost required for path following. We utilize the information gain (entropy reduction) for this purpose.

Assume that the control input 𝒖k=xk+1xk\bm{u}_{k}=x_{k+1}-x_{k} is applied to (4). The propagation of the prior covariance during the movement of the robot, over the time interval [tk,tk+1)[t_{k},t_{k+1}), is denoted as P^k=Pk+xk+1xkW\hat{P}_{k}=P_{k}+\|x_{k+1}-x_{k}\|W. At time tk+1t_{k+1}, the covariance is “reduced” to Pk+1(P^k)P_{k+1}(\preceq\hat{P}_{k}) by utilizing a sensor input. The minimum information gain (minimum number of bits that must be contained in the sensor data) for this transition is:

𝒟info(k)=12log2detP^k12log2detPk+1.\mathcal{D}_{\text{info}}(k)=\frac{1}{2}\log_{2}\det\hat{P}_{k}-\frac{1}{2}\log_{2}\det P_{k+1}. (6)

The notion of an “optimal” sensing signal which reduces P^k\hat{P}_{k} to Pk+1P_{k+1} has been previously discussed in [22] in the context of optimal sensing in filtering theory. The information cost function 𝒟info(k)\mathcal{D}_{\text{info}}(k) in (6) is well-defined for the pairs (Pk,Pk+1)(P_{k},P_{k+1}) satisfying Pk+1P^kP_{k+1}\preceq\hat{P}_{k}. For those pairs which do not satisfy Pk+1P^kP_{k+1}\preceq\hat{P}_{k}, we generalize (6) as:

𝒟info(k)=minQk+1012log2detP^k12log2detQk+1s.t. Qk+1Pk+1,Qk+1P^k.\begin{split}\mathcal{D}_{\text{info}}(k)=&\min_{Q_{k+1}\succeq 0}\quad\frac{1}{2}\log_{2}\det\hat{P}_{k}-\frac{1}{2}\log_{2}\det Q_{k+1}\\ &\quad\text{s.t. }\quad Q_{k+1}\preceq P_{k+1},\;\;Q_{k+1}\preceq\hat{P}_{k}.\end{split} (7)

Notice that (7) takes a non-negative value for any given transition from an origin (xk,Pk)(x_{k},P_{k}) to destination (xk+1,Pk+1)(x_{k+1},P_{k+1}). However, (7) is an implicit function involving a convex optimization problem in its expression (more precisely, the max-det problem [23]). To see why (7) is an appropriate generalization of (6), consider a two-step procedure P^kQk+1Pk+1\hat{P}_{k}\rightarrow Q_{k+1}\rightarrow P_{k+1} to update the prior covariance P^k\hat{P}_{k} to the posterior covariance Pk+1P_{k+1}. In the first step, the uncertainty is “reduced” from P^k\hat{P}_{k} to satisfy both Qk+1P^kQ_{k+1}\preceq\hat{P}_{k} and Qk+1Pk+1Q_{k+1}\preceq P_{k+1}. The associated information gain (the amount of telemetry data) is 12log2detP^k12log2detQk+1\frac{1}{2}\log_{2}\det\hat{P}_{k}-\frac{1}{2}\log_{2}\det Q_{k+1}. In the second step, the covariance Qk+1Q_{k+1} is “increased” to Pk+1(Qk+1)P_{k+1}(\succeq Q_{k+1}). This step incurs no information cost, since the location uncertainty can be increased simply by “deteriorating” the prior knowledge. The max-det problem (7) can then be interpreted as finding the optimal intermediate step Qk+1Q_{k+1} which minimizes the information gain in the first step.

II-C Total Cost

The cost to steer a random state variable 𝒙k𝒩(xk,Pk)\bm{x}_{k}\sim\mathcal{N}(x_{k},P_{k}) to 𝒙k+1𝒩(xk+1,Pk+1)\bm{x}_{k+1}\sim\mathcal{N}(x_{k+1},P_{k+1}) is a weighted sum of 𝒟cont(k)\mathcal{D}_{\text{cont}}(k) and 𝒟info(k)\mathcal{D}_{\text{info}}(k). Introducing α>0\alpha>0, the total RI cost is:

𝒟(xk,xk+1,Pk,Pk+1):=𝒟cont(k)+α𝒟info(k)=minQk+10xk+1xk+α2[log2detP^klog2detQk+1] s.t. Qk+1Pk+1,Qk+1P^k.\begin{split}&\mathcal{D}(x_{k},x_{k+1},P_{k},P_{k+1}):=\mathcal{D}_{\text{cont}}(k)+\alpha\mathcal{D}_{\text{info}}(k)\\ &\quad=\min_{Q_{k+1}\succ 0}\quad\|x_{k+1}-x_{k}\|\\ &\qquad\qquad\qquad+\frac{\alpha}{2}\left[\log_{2}\det\hat{P}_{k}-\log_{2}\det Q_{k+1}\right]\\ &\qquad\qquad\text{ s.t. }\;\;Q_{k+1}\preceq P_{k+1},\;\;Q_{k+1}\preceq\hat{P}_{k}.\end{split} (8)

By increasing α\alpha, more weight is placed on the amount of information which must be gained compared to the distance traversed. Note that the information cost 𝒟info\mathcal{D}_{\text{info}} is an asymmetric function, so that transitioning (x1,P1)(x2,P2)(x_{1},P_{1})\rightarrow(x_{2},P_{2}) does not return the same cost as (x2,P2)(x1,P1)(x_{2},P_{2})\rightarrow(x_{1},P_{1}).

III Problem Formulation

Having introduced the RI cost function (8), it is now appropriate to introduce the notion of path length. Let γ:[0,T]d×𝕊++d\gamma:[0,T]\rightarrow\mathbb{R}^{d}\times\mathbb{S}_{++}^{d}, γ(t)=(x(t),P(t))\gamma(t)=(x(t),P(t)) be a path. The RI length of a path γ\gamma is defined as:

c(γ):=sup𝒫k=0N1𝒟(x(tk),x(tk+1),P(tk),P(tk+1)),c(\gamma):=\sup_{\mathcal{P}}\sum_{k=0}^{N-1}\mathcal{D}\left(x(t_{k}),x(t_{k+1}),P(t_{k}),P(t_{k+1})\right),

where the supremum is over the space of partitions 𝒫\mathcal{P} of [0,T][0,T]. If γ(t)\gamma(t) is differentiable and WddtP(t)t[0,T]W\succeq\frac{d}{dt}P(t)\;\forall\;t\in[0,T], then it can be shown that:

c(γ)=0T[ddtx(t)+α2Tr(WddtP(t))P1(t)]𝑑tc(\gamma)=\int_{0}^{T}\left[\left\|\frac{d}{dt}x(t)\right\|+\frac{\alpha}{2}\text{Tr}\left(W-\frac{d}{dt}P(t)\right)P^{-1}(t)\right]dt

III-A Topology on the path space

In this subsection, we introduce a topology for the space of paths γ:[0,T]d×𝕊++d\gamma:[0,T]\rightarrow\mathbb{R}^{d}\times\mathbb{S}_{++}^{d}, which is necessary to discuss continuity of c(γ)c(\gamma). The space of all paths γ:[0,T]d×𝕊++d\gamma:[0,T]\rightarrow\mathbb{R}^{d}\times\mathbb{S}_{++}^{d} can be thought of as a subset (convex cone) of the space of generalized paths γ:[0,T]d×𝕊d\gamma:[0,T]\rightarrow\mathbb{R}^{d}\times\mathbb{S}^{d}. The space of generalized paths is a vector space on which addition and scalar multiplication exist and are defined as (γ1+γ2)(t)=(x1(t)+x2(t),P1(t)+P2(t))(\gamma_{1}+\gamma_{2})(t)=(x_{1}(t)+x_{2}(t),P_{1}(t)+P_{2}(t)) and αγ(t)=(αx(t),αP(t))\alpha\gamma(t)=(\alpha x(t),\alpha P(t)) for α\alpha\in\mathbb{R}, respectively. Assuming that a path can be partitioned such that 𝒫=(0=t0<t1<<tN=T)\mathcal{P}=(0=t_{0}<t_{1}<\cdots<t_{N}=T), the variation V(γ;𝒫)V(\gamma;\mathcal{P}) of a generalized path γ\gamma with respect to the choice of 𝒫\mathcal{P} is given by:

V(γ;𝒫):=x(0)+σ¯(P(0))+k=0N1[Δxk+σ¯(ΔPk)]V(\gamma;\mathcal{P}):=\|x(0)\|+\bar{\sigma}(P(0))+\sum_{k=0}^{N-1}\Bigl{[}\|\Delta x_{k}\|+\bar{\sigma}(\Delta P_{k})\Bigr{]}

where Δxk=x(tk+1)x(tk)\Delta x_{k}=x(t_{k+1})-x(t_{k}), and ΔPk=P(tk+1)P(tk)\Delta P_{k}=P(t_{k+1})-P(t_{k}). Utilizing the above definition for the variation of a path, the total variation of a generalized path γ\gamma corresponds to the partition 𝒫\mathcal{P} which results in the supremum of the variation:

|γ|TV:=sup𝒫V(γ;𝒫).|\gamma|_{\text{TV}}:=\sup_{\mathcal{P}}V(\gamma;\mathcal{P}).

Notice that ||TV|\cdot|_{\text{TV}} defines a norm on the space of generalized paths. The following relationship holds between |γ|TV|\gamma|_{\text{TV}} and

γ:=supt[0,T]x(t)+σ¯(P(t)).\|\gamma\|_{\infty}:=\sup_{t\in[0,T]}\|x(t)\|+\bar{\sigma}(P(t)).
Lemma 1

[24, Lemma 13.2] For a given path γ\gamma with partitioning 𝒫\mathcal{P} the following inequality holds:

γ|γ|TV.\|\gamma\|_{\infty}\leq|\gamma|_{\text{TV}}.
Proof:

See Appendix A for proof. ∎

In what follows, we assume on the space of generalized paths γ:[0,T]d×𝕊d\gamma:[0,T]\rightarrow\mathbb{R}^{d}\times\mathbb{S}^{d} the topology of total variation metric |γ1γ2|TV|\gamma_{1}-\gamma_{2}|_{\text{TV}}, which is then inherited to the space of paths γ:[0,T]d×𝕊++d\gamma:[0,T]\rightarrow\mathbb{R}^{d}\times\mathbb{S}_{++}^{d}. We denote by 𝒱[0,T]\mathcal{BV}[0,T] the space of paths γ:[0,T]d×𝕊++d\gamma:[0,T]\rightarrow\mathbb{R}^{d}\times\mathbb{S}_{++}^{d} such that |γ|TV<|\gamma|_{\text{TV}}<\infty. In the next subsection, we discuss the continuity of the RI path cost c()c(\cdot) in the space 𝒱[0,T]\mathcal{BV}[0,T].

III-B Continuity of RI Cost Function

The continuity of RI path cost function plays a critical role in determining the theoretical guarantees we can provide when we use sampling-based algorithms to find the shortest RI path. Specifically, the asymptotic optimality (the convergence to the path with the minimum cost as the number of nodes is increased) of RRT* algorithms [12], the main numerical method we use in this paper, expects the continuity of the path cost function. Showing that the RI cost function (8) is continuous requires additional derivation which is shown via Theorem 1.

Theorem 1

When d=1d=1, the path cost function c()c(\cdot) is continuous in the sense that for every γ𝒱[0,T]\gamma\in\mathcal{BV}[0,T], γ:[0,T]1×𝕊2ϵ1\gamma:[0,T]\rightarrow\mathbb{R}^{1}\times\mathbb{S}_{2\epsilon}^{1}, and for every ϵ0>0\epsilon_{0}>0, there exists δ>0\delta>0 such that

|γγ|TV<δ|c(γ)c(γ)|<ϵ0.|\gamma^{\prime}-\gamma|_{\text{TV}}<\delta\quad\Rightarrow\quad|c(\gamma^{\prime})-c(\gamma)|<\epsilon_{0}.
Proof:

See Appendix B for proof. ∎

Before discussing the required modifications for implementing RRT* algorithm with RI cost in Section IV, we first characterize the shortest RI path in the obstacle-free space, and then formally define the shortest RI path problem in obstacle-filled spaces in the following subsections.

III-C Shortest Path in Obstacle-Free Space

In obstacle-free space, it can be shown that the optimal path cost between z1=(x1,P1)z_{1}=(x_{1},P_{1}) and z2=(x2,P2)z_{2}=(x_{2},P_{2}) is equal to 𝒟(z1,z2)\mathcal{D}(z_{1},z_{2}). In other words, the triangular inequality 𝒟(z1,z2)𝒟(z1,zint)+𝒟(zint,z2)\mathcal{D}(z_{1},z_{2})\leq\mathcal{D}(z_{1},z_{int})+\mathcal{D}(z_{int},z_{2}) holds. This means it is optimal for the robot to follow the direct path from x1x_{1} to x2x_{2} without sensing, and then make a measurement at x2x_{2} to shrink the uncertainty from P^1\hat{P}_{1} to P2P_{2}. In what follows, we call such a motion plan the “move-and-sense” strategy. The optimality of the move-and-sense path for one-dimensional geometric space is shown in Appendix C. We confirm this optimality by simulation in Section V-A, where the move-and-sense path is the wedge-shaped path depicted in Fig. 3 (a).

III-D Shortest Path Formulation

The utility of path-planning algorithms is made non-trivial by the introduction of obstacles in the path space. Let XobsdX_{obs}\subset\mathbb{R}^{d} be a closed subset of spatial points representing obstacles. The initial configuration of the robot is defined as zinit=(x0,P0)d×𝕊++dz_{\text{init}}=(x_{0},P_{0})\in\mathbb{R}^{d}\times\mathbb{S}_{++}^{d}, while 𝒵targetd×𝕊++d\mathcal{Z}_{\text{target}}\subset\mathbb{R}^{d}\times\mathbb{S}_{++}^{d} is a given closed subset representing the target region which the robot desires to attain. Given a confidence level parameter χ2>0\chi^{2}>0, the shortest RI path problem can be formulated as:

minγ𝒱[0,T]c(γ) s.t. γ(0)=zinit,γ(T)𝒵target(x(t)xobs)P1(t)(x(t)xobs)χ2,t[0,T],xobsXobs.\begin{split}\min_{\gamma\in\mathcal{BV}[0,T]}\;\;&c(\gamma)\\ \text{ s.t. }\;\;\;\;&\gamma(0)=z_{\text{init}},\;\;\gamma(T)\in\mathcal{Z}_{\text{target}}\\ &(x(t)-x_{\text{obs}})^{\top}P^{-1}(t)(x(t)-x_{\text{obs}})\geq\chi^{2},\\ &\qquad\forall t\in[0,T],\;\;\forall x_{\text{obs}}\in X_{obs}.\end{split} (9)

The χ2\chi^{2} term in the constraints of (9) is implemented to provide a confidence bound on probability that a robot with position x(t)x(t) will not be in contact with an obstacle xobsx_{\text{obs}}.

IV RI-RRT* Algorithm

IV-A RRT*

The RRT algorithm [25] constructs a tree of nodes (state realizations) through random sampling of the feasible state-space and then connects these nodes with edges (tree branches). A user-defined cost is utilized to quantify the length of the edges, which are in turn summed to form path lengths. Each new node is connected via a permanent edge to the existing node which provides the shortest path between the new node and the initial node of the tree. Although the RRT algorithm is known to be probabilistically complete (the algorithm finds a feasible path if one exists), it does not achieve asymptotic optimality (path cost does not converge to the optimal one as the number of nodes is increased) [12]. The RRT* algorithm [12] attains asymptotic optimality by including an additional “re-wiring” step that re-evaluates if the path length for each node can be reduced via a connection to the newly created node. This paper utilizes RRT* as a numerical approach to the shortest path problem (9).

IV-B Algorithm

Provided below is a Rationally Inattentive RRT* (RI-RRT*) algorithm for finding a solution to (9). Like the original RRT* algorithm, the RI-RRT* algorithm constructs a graph of state nodes and edges (G(Z,E)G\leftarrow(Z,E)) in spaces with or without obstacles.

1 (z1)(zinit)(z_{1})\leftarrow(z_{\text{init}}); EE\leftarrow\emptyset; G(z1,E)G^{\prime}\leftarrow(z_{1},E);
2 for i=2:Ni=2:N do
3       GGG\leftarrow G^{\prime} ;
4       zi=(xi,Pi)Generate(i)z_{i}=(x_{i},P_{i})\leftarrow\textsc{Generate}(i);
5       (Z,E)(Z,E)(Z^{\prime},E^{\prime})\leftarrow(Z,E);
6       znearNearest(Z,zi)z_{\text{near}}\leftarrow\textsc{Nearest}(Z^{\prime},z_{i});
7       znewScale(znear,zi,EDmin)z_{\text{new}}\leftarrow\textsc{Scale}(z_{\text{near}},z_{i},ED_{\text{min}});
8       if ObsCheck(znear,znew)=False\textsc{ObsCheck}(z_{\text{near}},z_{\text{new}})=False then
9             ZZznewZ^{\prime}\leftarrow Z^{\prime}\cup z_{\text{new}};
10             ZnborsNeighbor(Z,znew,EDnbors)Z_{\text{nbors}}\leftarrow\textsc{Neighbor}(Z,z_{\text{new}},ED_{\text{nbors}});
11             PathznewrealmaxPath_{z_{\text{new}}}\leftarrow realmax;
12             for zjZnborsz_{j}\in Z_{\text{nbors}} do
13                   if ObsCheck(zj,znew)=False\textsc{ObsCheck}(z_{j},z_{\text{new}})=False then
14                         Pathznew,jPathzj+𝒟(zj,znew)Path_{z_{\text{new},j}}\leftarrow Path_{z_{j}}+\mathcal{D}(z_{j},z_{\text{new}});
15                         if Pathznew,j<PathznewPath_{z_{\text{new},j}}<Path_{z_{\text{new}}} then
16                               PathznewPathznew,jPath_{z_{\text{new}}}\leftarrow Path_{z_{\text{new},j}};
17                               znborzjz_{\text{nbor}}^{*}\leftarrow z_{j};
18                              
19                        
20                  
21            E[znbor,znew]EE^{\prime}\leftarrow\left[z_{\text{nbor}}^{*},z_{\text{new}}\right]\cup E^{\prime};
22             for zjZnbors\znborz_{j}\in Z_{\text{nbors}}\>\backslash\>z_{\text{nbor}}^{*} do
23                   if ObsCheck(znew,zj)=False\textsc{ObsCheck}(z_{\text{new}},z_{j})=False then
24                         Pathzj,rewire=Pathznew+𝒟(znew,zj)Path_{z_{j},\text{rewire}}=Path_{z_{\text{new}}}+\mathcal{D}(z_{\text{new}},z_{j});
25                         if Pathzj,rewire<PathzjPath_{z_{j},\text{rewire}}<Path_{z_{j}} then
26                               EE[znew,zj]\[zj,parent,zj]E^{\prime}\leftarrow E^{\prime}\cup\left[z_{\text{new}},z_{j}\right]\backslash\left[z_{j,\text{parent}},z_{j}\right];
27                               zj,parentznewz_{j,\text{parent}}\leftarrow z_{\text{new}};
28                               UpdateDes(G,zj)\textsc{UpdateDes}(G,z_{j});
29                              
30                        
31                  
32            
33      G(Z,E)G^{\prime}\leftarrow(Z^{\prime},E^{\prime})
Algorithm 1 RI-RRT* Algorithm

In Algorithm 1, the Generate(i)\textsc{Generate}(i) function creates a new point by randomly sampling a spatial location (xdx\in\mathbb{R}^{d}) and covariance (P𝕊++dP\in\mathbb{S}_{++}^{d}). Notice that for a dd-dimensional configuration space, the corresponding uncertain configuration space d×𝕊++d\mathbb{R}^{d}\times\mathbb{S}^{d}_{++} has d+12d(d+1)d+\frac{1}{2}d(d+1) dimensions from which the samples are generated. The Nearest(Z,zi)\textsc{Nearest}(Z,z_{i}) function finds the nearest point (znearz_{\text{near}}), in metric 𝒟^(z,z):=xx+PPF\hat{\mathcal{D}}(z,z^{\prime}):=\|x-x^{\prime}\|+\|P-P^{\prime}\|_{F}, between the newly generated state zi=(xi,Pi)z_{i}=(x_{i},P_{i}) and an existing state in the set ZZ. Using the metric 𝒟^(z,z)\hat{\mathcal{D}}(z,z^{\prime}), the Scale(znear,zi,EDmin)\textsc{Scale}(z_{\text{near}},z_{i},ED_{\text{min}}) function linearly shifts the generated point (ziz_{i}) to a new location as:

znew={znear+EDmin𝒟^(zi,znear)(ziznear)if𝒟^(z,z)>EDmin,ziotherwise,z_{\text{new}}\!=\!\!\begin{cases}z_{\text{near}}\!+\!\frac{ED_{\text{min}}}{\hat{\mathcal{D}}(z_{i},z_{\text{near}})}\left(z_{i}-z_{\text{near}}\right)\leavevmode\nobreak\ \text{if}\leavevmode\nobreak\ \hat{\mathcal{D}}(z,z^{\prime})>ED_{\text{min}},\\ z_{i}\hskip 108.12054pt\text{otherwise,}\end{cases}

where EDminED_{\text{min}} is a user-defined constant. In addition to generating znewz_{\text{new}}, the Scale function also ensures that its χ2\chi^{2} covariance region does not interfere with any obstacles.

Refer to caption
Figure 2: Transition from state znearz_{\text{near}} to znewz_{\text{new}}. The blue ellipses represent the propagation of covariance where the ObsCheck(znear,znew)\textsc{ObsCheck}(z_{\text{near}},z_{\text{new}}) function in Algorithm 1 checks collisions between these propagated covariances, including znearz_{\text{near}} and znewz_{\text{new}}, and obstacles. The measurement at znew=(xnew,Pnew)z_{\text{new}}=(x_{\text{new}},P_{\text{new}}) makes a covariance smaller.

The ObsCheck(znear,znew)\textsc{ObsCheck}(z_{\text{near}},z_{\text{new}}) function ensures that transition from state znearz_{\text{near}} to znewz_{\text{new}} does not intersect with the obstacles. More precisely, we assume the transition znearznewz_{\text{near}}\rightarrow z_{\text{new}} follows the move-and-sense path, introduced in Section III-C, and the ObsCheck function returns FalseFalse if all state pairs along the move-and-sense path, shown by blue ellipses in Fig. 2, has χ2\chi^{2} covariance regions that are non-interfering with obstacles.

The Neighbor(Z,znew,EDnbors)\textsc{Neighbor}(Z,z_{\text{new}},ED_{\text{nbors}}) function returns the sub-set of nodes described as Znbors={zi=(xi,Pi)Z:𝒟^(zi,znew)EDnbors}Z_{\text{nbors}}=\{z_{i}=(x_{i},P_{i})\in Z:\hat{\mathcal{D}}(z_{i},z_{\text{new}})\leq ED_{\text{nbors}}\}. This set is then evaluated for the presence of obstacles via the ObsCheck function of the previous paragraph. Note that in this instance, the function is evaluating obstacle interference along the continuous path of state-covariance pairs from zjznewz_{j}\rightarrow z_{\text{new}}. Lines 14-17 of Algorithm 1 connect the new node to the existing graph in an identical manner to RRT*, where the PathzPath_{z} denote the cost of the path from the zinitz_{init} to node zz through the edges of GG. Line 18 creates a new edge between the new node and the existing nodes from the neighbor group ZnborsZ_{\text{nbors}} which results in the minimum PathznewPath_{z_{\text{new}}}. The calculation of RI path cost in Line 14 utilizes (8).

Lines 19-24 are the tree re-wiring steps of Algorithm 1. In line 20, the ObsCheck function is called again. This is because the move-and-sense path is direction-dependent, and thus ObsCheck(zj,znew)=False\textsc{ObsCheck}(z_{j},z_{\text{new}})=False does not necessarily imply ObsCheck(znew,zj)=False\textsc{ObsCheck}(z_{\text{new}},z_{j})=False. Finally, for each rewired node zjz_{j}, its cost (i.e., PathzjPath_{z_{j}}) and the cost of its descendants are updated via UpdateDes(G,zj)\textsc{UpdateDes}(G,z_{j}) function in line 25.

To increase the computational efficiency of RI-RRT* algorithm we deploy a branch-and-bound technique as detailed in [26]. For a given tree GG, let zminz_{\text{min}} be the node that has the lowest cost along the nodes of GG within 𝒵target\mathcal{Z}_{\text{target}}. As discussed in Section III-C, 𝒟(z,zgoal)\mathcal{D}(z,z_{\text{goal}}) is a lower-bound for the cost of transitioning from zz to zgoalz_{\text{goal}}. The branch-and-bound algorithm periodically deletes the nodes Z′′={zZ:Pathz+𝒟(z,zgoal)Pathzmin}Z^{\prime\prime}=\{z\in Z:Path_{z}+\mathcal{D}(z,z_{\text{goal}})\geq Path_{z_{\text{min}}}\}. This elimination of the non-optimal nodes speeds up the RI-RRT* algorithm.

IV-C Properties of RI-RRT*

The question regarding the asymptotic optimality of RI-RRT* naturally arises. Recall that the proof of the asymptotic optimality of the RRT* algorithm [27] is founded on four main assumptions:

  1. 1.

    additivity of the cost function,

  2. 2.

    the cost function is monotonic,

  3. 3.

    there exists a finite distance between all points on the optimal path and the obstacle space,

  4. 4.

    the cost function is Lipschitz continuous, either in the topology of total variation metric [27] or the supremum norm metric [12].

The proofs of the first three assumptions are trivial for the RI cost (8). However, Theorem 1 does not suffice to guarantee that the RI cost meets the fourth condition for d2d\geq 2. For this reason, currently the asymptotic optimality of the RI-RRT* algorithm cannot be guaranteed, while the numerical simulations of Section V do show that the proposed algorithm does have merit in rationally inattentive path-planning.

V Simulation Results

V-A One-Dimensional Simulation

Refer to caption
(a) A generated path
Refer to caption
(b) Path cost for 100100 runs
Figure 3: Results of the RI-RRT* algorithm with α=1\alpha=1 and W=0.75W=0.75 applied to one-dimensional joint movement-perception problem. (a): The blue line illustrates the path generated with 10,00010,000 nodes, which almost converges to the known optimal path depicted as the red curve. (b): The total path cost for 100100 runs of 10,00010,000 nodes. Gray lines plot the path cost for each run, while the average of them is shown in the red line. The path costs of all 100100 runs approach the optimal path cost (dashed blue line).

The first study is the case of a robot which is allowed to travel at a constant velocity in a one-dimensional geometric space from a predetermined initial position and covariance z0=(x0,P0)z_{0}=(x_{0},P_{0}), specified by the red dot in Fig. 3 (a). The robot has a goal of reaching some final state within the blue box representing a goal region which is a sub-set of the reachable space. Note that the goal region contains acceptable bounds on both location and uncertainty. Although, in one-dimensional setting, the strategy which minimizes the control cost is obviously the one that moves directly toward the target region, we utilize the RI-RRT* algorithm to solve the non-trivial measurement scheduling problem.

In Fig. 3 (a), the blue curve represents the path generated by the RI-RRT* algorithm with 10,00010,000 nodes, which is sufficiently close to the shortest path obtainable via the RI-distance depicted as the red curve. These wedge-shaped optimal paths are created by the “move-and-sense” strategy integrated in the RI-cost, where it has a section of covariance propagation followed by an instantaneous reduction of covariance as discussed in Section III-C. For example, if the robot were an autonomous ground vehicle with GPS capabilities, then this path signifies the robot driving the total distance without any GPS updates, followed by a reduction its spatial uncertainty with a single update once the goal region is reached. The minimum path cost at the end of each iteration of the for-loop in Algorithm 1 is depicted in Fig. 3 (b), where the red curve represents the average of 100100 independent simulations. The path cost of each simulation approaches to the optimal cost.

V-B Two-Dimensional Asymmetric Simulation

Refer to caption
Figure 4: Results of the RI-RRT* algorithm with 10,00010,000 nodes in the two-dimensional space containing roughly two paths, A and B, separated by a diagonal wall. The black line is the shortest path with the associated covariance ellipses. The blue ellipses illustrate the propagation of covariance between nodes. The simulation was completed with W=103IW=10^{-3}I and χ2\chi^{2} covariance ellipses representing 90%90\% certainty regions. The boundaries of the plots are considered as obstacles.

The asymmetric characteristic of the RI-cost is demonstrated via a simulation in the two-dimensional configuration space with a diagonal wall, as seen in Fig. 4. The initial position and covariance of the robot is depicted as a red dot, while the target region is illustrated as the black rectangle at the upper-right corner.

The path is generated by the RI-RRT* algorithm by sampling 10,00010,000 nodes. The corresponding sampled covariance ellipses are shown in black where the blue ellipses represent covariance propagation. As shown in Fig. 4, there are two options; path A requires the robot to move into a funnel-shaped corridor, while the path B moves out of a funnel. In this setting, the RI-cost prefers the path B even though both A and B have the same Euclidean distance. This asymmetric behavior results from the fact that as the robot approaches the goal region path B requires a less severe uncertainty reduction compared to path A. Similarly, by exchanging the start and goal positions, the RI-cost prefers path A over path B, thus displaying the directional dependency of our efficient sensing strategy.

V-C Two-Dimensional Simulation with Multiple Obstacles

Refer to caption
(a) α=0\alpha=0
Refer to caption
(b) α=0.1\alpha=0.1
Refer to caption
(c) α=0.3\alpha=0.3
Figure 5: Results of simulation with α=0,0.1,0.3\alpha=0,0.1,0.3 under the existence of multiple obstacles. The simulation was completed with W=103IW=10^{-3}I and χ2\chi^{2} covariance ellipses representing 90%90\% certainty regions. The boundaries of the plots are considered as obstacles.

In a final demonstration, the RI-RRT* algorithms with α=0,0.1,\alpha=0,0.1, and 0.30.3 are implemented in a two-dimensional configuration space containing multiple obstacles in order to illustrate the effects of varying the information cost. All three paths in panels of Fig. 5 are generated from 4,000 nodes.

As seen in Fig. 5 (a), when α=0\alpha=0 the algorithm simply finds a path that has the shortest Euclidean distance, even if that path requires frequent sensing actions. In contrast, the RI-RRT* algorithm with α=0.3\alpha=0.3 does not take a constrained pathway, and thus requires fewer sensor actuations in order to avoid obstacle collisions. As a result, the algorithm deviates from the shortest Euclidean distance path and allows the covariance to propagate safely, as seen in Fig. 5 (c). A moderate path, illustrated in Fig. 5 (b), can be also obtained by choosing α=0.1\alpha=0.1.

VI Conclusion

In this work, a novel RI cost for utilization in path-planning algorithms is presented. The cost accounts for both the path distance traversed (efficiency) and the amount of information which must be perceived by the robot. Information gained from perception is important in that it allows the robot to safely navigate obstacle-filled environments with confidence that collisions will be avoided. This method, in balancing path distance and perception costs, provides a simplicity-based path which can be tailored to mimic the results potentially generated by an expert human path-planner. Three numerical simulations were provided demonstrating these results.

Currently, the preliminary version of the RI-RRT* algorithm is optimized for computational efficiency by the aid of a branch-and-bound technique. The authors note that utilizing other RRT* improvement methods, such as k-d trees could further improve the computational speed of Algorithm 1 and should be considered in future work. In the same vein of future work, the authors note the importance of quantifying the impact that the user-defined constants, such as the distances which signify which nodes are neighbors, have on the results of the RI-RRT* algorithm. Also, the topic of path refinement should be further explored as RRT*-like algorithms converge asymptotically.

It should be noted that once the RI-RRT* algorithm finds an initial feasible path, there exist iterative methods for path “smoothing” which do not require additional node sampling. In a theorized hybrid method, RI-RRT* is first utilized to find some initial path and then an iterative method takes the initial path and “smooths” towards the optimal path. Convergence benefits of iterative methods are often improved as the initial path guess is more similar to the optimal path, and a trade-off could be found in computational efficiency which results in the best time to switch between algorithms.

Appendix A Proof of Lemma 1

For every t[0,T]t\in[0,T], set a partition 𝒫=(0,t,T)\mathcal{P}=(0,t,T). Then

x(t)+σ¯(P(t))\displaystyle\|x(t)\|+\bar{\sigma}(P(t)) x(0)+σ¯(P(0))\displaystyle\leq\|x(0)\|+\bar{\sigma}(P(0))
+x(t)x(0)+σ¯(P(t)P(0))\displaystyle\qquad+\|x(t)-x(0)\|+\bar{\sigma}(P(t)-P(0))
x(0)+σ¯(P(0))\displaystyle\leq\|x(0)\|+\bar{\sigma}(P(0))
+x(t)x(0)+σ¯(P(t)P(0))\displaystyle\qquad+\|x(t)-x(0)\|+\bar{\sigma}(P(t)-P(0))
+x(T)x(t)+σ¯(P(T)P(t))\displaystyle\qquad+\|x(T)-x(t)\|+\bar{\sigma}(P(T)-P(t))
V(γ,𝒫)|γ|TV.\displaystyle\leq V(\gamma,\mathcal{P})\leq|\gamma|_{\text{TV}}.

Appendix B Proof of Theorem 1

The proof is based on the following lemma:

Lemma 2

Assume d=1d=1. For each (ϵ,δ)(\epsilon,\delta) satisfying 0<δϵ20<\delta\leq\frac{\epsilon}{2}, there exists a constant LϵL_{\epsilon} such that the inequality:

|𝒟(xk,xk+1,Pk,Pk+1)𝒟(xk,xk+1,Pk,Pk+1)|Lϵ[|(xk+1xk+1)(xkxk)|+|(Pk+1Pk+1)(PkPk)|+δ|Pk+1Pk|+δ|xk+1xk|]\begin{split}&|\mathcal{D}(x^{\prime}_{k},x^{\prime}_{k+1},P^{\prime}_{k},P^{\prime}_{k+1})-\mathcal{D}(x_{k},x_{k+1},P_{k},P_{k+1})|\\ &\qquad\leq L_{\epsilon}\Bigl{[}\left|(x^{\prime}_{k+1}-x_{k+1})-(x^{\prime}_{k}-x_{k})\right|\\ &\qquad\qquad\qquad+\left|(P^{\prime}_{k+1}-P_{k+1})-(P^{\prime}_{k}-P_{k})\right|\\ &\qquad\qquad\qquad+\delta\left|P_{k+1}-P_{k}\right|+\delta\left|x_{k+1}-x_{k}\right|\Bigr{]}\\ \end{split}

holds for all

xk,xk+1,xk,xk+1andPk,Pk+1,Pk,Pk+1ϵ\begin{split}&x_{k}^{\prime},x_{k+1}^{\prime},x_{k},x_{k+1}\in\mathbb{R}\;\;\text{and}\;\;P_{k}^{\prime},P_{k+1}^{\prime},P_{k},P_{k+1}\geq\epsilon\end{split}

such that

Δxk:=xkxkδ,Δxk+1:=xk+1xk+1δΔPk:=PkPkδ,ΔPk+1:=Pk+1Pk+1δ.\begin{split}&\Delta x_{k}:=x_{k}^{\prime}-x_{k}\leq\delta,\;\;\Delta x_{k+1}:=x_{k+1}^{\prime}-x_{k+1}\leq\delta\\ &\Delta P_{k}:=P_{k}^{\prime}-P_{k}\leq\delta,\;\;\Delta P_{k+1}:=P_{k+1}^{\prime}-P_{k+1}\leq\delta.\end{split}
Proof:

For simplicity, we assume α=W=1\alpha=W=1, but the extension of the following proof to general cases is straightforward. In what follows, we write

𝒟info(xk,xk+1,Pk,Pk+1)\displaystyle\mathcal{D}_{\text{info}}(x_{k},x_{k+1},P_{k},P_{k+1})
:=max{0,12log2(Pk+|xk+1xk|)12log2(Pk+1)}.\displaystyle:=\max\left\{0,\frac{1}{2}\log_{2}(P_{k}+\left|x_{k+1}-x_{k}\right|)-\frac{1}{2}\log_{2}(P_{k+1})\right\}.

We consider four different cases depending on the signs of 𝒟info(xk,xk+1,Pk,Pk+1)\mathcal{D}_{\text{info}}(x_{k}^{\prime},x_{k+1}^{\prime},P_{k}^{\prime},P_{k+1}^{\prime}) and 𝒟info(xk,xk+1,Pk,Pk+1)\mathcal{D}_{\text{info}}(x_{k},x_{k+1},P_{k},P_{k+1}).

Case 1: First, we consider the case with

𝒟info(xk,xk+1,Pk,Pk+1)>0 and\displaystyle\mathcal{D}_{\text{info}}(x_{k}^{\prime},x_{k+1}^{\prime},P_{k}^{\prime},P_{k+1}^{\prime})>0\text{ and }
𝒟info(xk,xk+1,Pk,Pk+1)>0.\displaystyle\mathcal{D}_{\text{info}}(x_{k},x_{k+1},P_{k},P_{k+1})>0.

In this case:

|𝒟(xk,xk+1,Pk,Pk+1)𝒟(xk,xk+1,Pk,Pk+1)|=||xk+1+Δxk+1xkΔxk||xk+1xk|+12log2(Pk+ΔPk+|xk+1+Δxk+1xkΔxk|)12log2(Pk+1+ΔPk+1)12log2(Pk+|xk+1xk|)+12log2(Pk+1)|||xk+1+Δxk+1xkΔxk||xk+1xk||+12|log2(1+ΔPkPk+|xk+1+Δxk+1xkΔxk|Pk)log2(1+|xk+1xk|Pk+Pk+|xk+1xk|PkPk+1ΔPk+1)|.\small\begin{split}&\left|\mathcal{D}(x^{\prime}_{k},x^{\prime}_{k+1},P^{\prime}_{k},P^{\prime}_{k+1})-\mathcal{D}(x_{k},x_{k+1},P_{k},P_{k+1})\right|\\ &=\bigg{|}\left|x_{k+1}+\Delta x_{k+1}-x_{k}-\Delta x_{k}\right|-\left|x_{k+1}-x_{k}\right|\\ &\quad+\left.\frac{1}{2}\log_{2}(P_{k}+\Delta P_{k}+\left|x_{k+1}+\Delta x_{k+1}-x_{k}-\Delta x_{k}\right|)\right.\\ &\quad\left.-\frac{1}{2}\log_{2}(P_{k+1}+\Delta P_{k+1})-\frac{1}{2}\log_{2}(P_{k}+\left|x_{k+1}-x_{k}\right|)\right.\\ &\quad+\frac{1}{2}\log_{2}(P_{k+1})\bigg{|}\\ &\leq\bigg{|}\left|x_{k+1}+\Delta x_{k+1}-x_{k}-\Delta x_{k}\right|-\left|x_{k+1}-x_{k}\right|\bigg{|}\\ &\quad+\frac{1}{2}\left|\log_{2}\left(1+\frac{\Delta P_{k}}{P_{k}}+\frac{\left|x_{k+1}+\Delta x_{k+1}-x_{k}-\Delta x_{k}\right|}{P_{k}}\right)\right.\\ &\left.\quad-\log_{2}\left(1+\frac{\left|x_{k+1}-x_{k}\right|}{P_{k}}+\frac{P_{k}+\left|x_{k+1}-x_{k}\right|}{P_{k}P_{k+1}}\Delta P_{k+1}\right)\right|.\end{split} (10)

Using the fact that ||a+b||a|||b|\left||a+b|-|a|\right|\leq\left|b\right| for a,ba,b\in\mathbb{R}, we have:

||xk+1+Δxk+1xkΔxk||xk+1xk|||Δxk+1Δxk|.\begin{split}&\left|\left|x_{k+1}+\Delta x_{k+1}-x_{k}-\Delta x_{k}\right|-\left|x_{k+1}-x_{k}\right|\right|\\ &\qquad\leq\left|\Delta x_{k+1}-\Delta x_{k}\right|.\end{split}

Noticing that the arguments in the logarithmic terms in (10) are 12\geq\frac{1}{2}, and using the fact that |log2(a)log2(b)|2log2(e)|ab|,a,b12\left|\log_{2}(a)-\log_{2}(b)\right|\leq 2\log_{2}(e)|a-b|,\;\forall a,b\geq\frac{1}{2}, we have:

12|log2(1+ΔPkPk+|xk+1+Δxk+1xkΔxk|Pk)log2(1+|xk+1xk|Pk+Pk+|xk+1xk|PkPk+1ΔPk+1)|log2(e)||xk+1+Δxk+1xkΔxk|Pk|xk+1xk|Pk+ΔPkPkPk+|xk+1xk|PkPk+1ΔPk+1|log2(e)Pk||xk+1+Δxk+1xkΔxk||xk+1xk||+log2(e)|ΔPkΔPk+1Pk+Pk+1Pk|xk+1xk|PkPk+1ΔPk+1|log2(e)Pk|Δxk+1Δxk|+log2(e)Pk|ΔPk+1ΔPk|+log2(e)|ΔPk+1|PkPk+1|Pk+1Pk|xk+1xk||log2(e)Pk|Δxk+1Δxk|+log2(e)Pk|ΔPk+1ΔPk|+log2(e)|ΔPk+1|PkPk+1|xk+1xk|+log2(e)|ΔPk+1|PkPk+1|Pk+1Pk|log2(e)ϵ|Δxk+1Δxk|+log2(e)ϵ|ΔPk+1ΔPk|+log2(e)δϵ2|xk+1xk|+log2(e)δϵ2|Pk+1Pk|\begin{split}&\frac{1}{2}\left|\log_{2}\left(1+\frac{\Delta P_{k}}{P_{k}}+\frac{\left|x_{k+1}+\Delta x_{k+1}-x_{k}-\Delta x_{k}\right|}{P_{k}}\right)\right.\\ &\left.\quad-\log_{2}\left(1+\frac{\left|x_{k+1}-x_{k}\right|}{P_{k}}+\frac{P_{k}+\left|x_{k+1}-x_{k}\right|}{P_{k}P_{k+1}}\Delta P_{k+1}\right)\right|\\ &\leq\log_{2}(e)\left|\frac{\left|x_{k+1}+\Delta x_{k+1}-x_{k}-\Delta x_{k}\right|}{P_{k}}-\frac{\left|x_{k+1}-x_{k}\right|}{P_{k}}\right.\\ &\left.\quad+\frac{\Delta P_{k}}{P_{k}}-\frac{P_{k}+\left|x_{k+1}-x_{k}\right|}{P_{k}P_{k+1}}\Delta P_{k+1}\right|\\ &\leq\frac{\log_{2}(e)}{P_{k}}\bigg{|}|x_{k+1}+\Delta x_{k+1}-x_{k}-\Delta x_{k}|-|x_{k+1}-x_{k}|\bigg{|}\\ &\quad+\!\log_{2}(e)\bigg{|}\frac{\Delta P_{k}\!-\!\Delta P_{k+1}}{P_{k}}\\ &\hskip 56.9055pt+\frac{P_{k+1}\!-\!P_{k}\!-\!|x_{k+1}\!-\!x_{k}|}{P_{k}P_{k+1}}\Delta P_{k+1}\bigg{|}\\ &\leq\frac{\log_{2}(e)}{P_{k}}|\Delta x_{k+1}-\Delta x_{k}|+\frac{\log_{2}(e)}{P_{k}}|\Delta P_{k+1}-\Delta P_{k}|\\ &\quad+\frac{\log_{2}(e)\left|\Delta P_{k+1}\right|}{P_{k}P_{k+1}}\big{|}P_{k+1}-P_{k}-|x_{k+1}-x_{k}|\big{|}\\ &\leq\frac{\log_{2}(e)}{P_{k}}|\Delta x_{k+1}-\Delta x_{k}|+\frac{\log_{2}(e)}{P_{k}}|\Delta P_{k+1}-\Delta P_{k}|\\ &\quad+\frac{\log_{2}(e)|\Delta P_{k+1}|}{P_{k}P_{k+1}}|x_{k+1}-x_{k}|\\ &\quad+\frac{\log_{2}(e)|\Delta P_{k+1}|}{P_{k}P_{k+1}}|P_{k+1}-P_{k}|\\ &\leq\frac{\log_{2}(e)}{\epsilon}|\Delta x_{k+1}-\Delta x_{k}|+\frac{\log_{2}(e)}{\epsilon}|\Delta P_{k+1}-\Delta P_{k}|\\ &\quad+\frac{\log_{2}(e)\delta}{\epsilon^{2}}|x_{k+1}-x_{k}|+\frac{\log_{2}(e)\delta}{\epsilon^{2}}|P_{k+1}-P_{k}|\end{split}

Therefore,

|𝒟(xk,xk+1,Pk,Pk+1)𝒟(xk,xk+1,Pk,Pk+1)|(1+log2(e)ϵ)|Δxk+1Δxk|+log2(e)ϵ|ΔPk+1ΔPk|+log2(e)δϵ2|xk+1xk|+log2(e)δϵ2|Pk+1Pk|\begin{split}&|\mathcal{D}(x^{\prime}_{k},x^{\prime}_{k+1},P^{\prime}_{k},P^{\prime}_{k+1})-\mathcal{D}(x_{k},x_{k+1},P_{k},P_{k+1})|\\ &\leq\left(1+\frac{\log_{2}(e)}{\epsilon}\right)|\Delta x_{k+1}-\Delta x_{k}|\\ &\quad+\frac{\log_{2}(e)}{\epsilon}|\Delta P_{k+1}-\Delta P_{k}|\\ &\quad+\frac{\log_{2}(e)\delta}{\epsilon^{2}}|x_{k+1}-x_{k}|+\frac{\log_{2}(e)\delta}{\epsilon^{2}}|P_{k+1}-P_{k}|\end{split} (11)

Case 2: Next, we consider the case with

𝒟info(xk,xk+1,Pk,Pk+1)\displaystyle\mathcal{D}_{\text{info}}(x_{k}^{\prime},x_{k+1}^{\prime},P_{k}^{\prime},P_{k+1}^{\prime}) >0 and\displaystyle>0\text{ and } (12a)
𝒟info(xk,xk+1,Pk,Pk+1)\displaystyle\mathcal{D}_{\text{info}}(x_{k},x_{k+1},P_{k},P_{k+1}) =0.\displaystyle=0. (12b)

Notice that (12b) implies Pk+|xk+1xk|Pk+10P_{k}+|x_{k+1}-x_{k}|-P_{k+1}\leq 0. In this case:

|𝒟(xk,xk+1,Pk,Pk+1)𝒟(xk,xk+1,Pk,Pk+1)|\displaystyle|\mathcal{D}(x^{\prime}_{k},x^{\prime}_{k+1},P^{\prime}_{k},P^{\prime}_{k+1})-\mathcal{D}(x_{k},x_{k+1},P_{k},P_{k+1})|
=||xk+1+Δxk+1xkΔxk||xk+1xk|\displaystyle=\bigg{|}|x_{k+1}+\Delta x_{k+1}-x_{k}-\Delta x_{k}|-|x_{k+1}-x_{k}|
+12log2(Pk+ΔPk+|xk+1+Δxk+1xkΔxk|)\displaystyle\quad+\left.\frac{1}{2}\log_{2}\left(P_{k}+\Delta P_{k}+\left|x_{k+1}+\Delta x_{k+1}-x_{k}-\Delta x_{k}\right|\right)\right.
12log2(Pk+1+ΔPk+1)|\displaystyle\quad-\frac{1}{2}\log_{2}(P_{k+1}+\Delta P_{k+1})\bigg{|} (13a)
|Δxk+1Δxk|12log2(Pk+1+ΔPk+1)\displaystyle\leq\left|\Delta x_{k+1}-\Delta x_{k}\right|-\frac{1}{2}\log_{2}(P_{k+1}+\Delta P_{k+1})
+12log2(Pk+ΔPk+|xk+1+Δxk+1xkΔxk|)\displaystyle\quad+\frac{1}{2}\log_{2}(P_{k}+\Delta P_{k}+|x_{k+1}+\Delta x_{k+1}-x_{k}-\Delta x_{k}|) (13b)
|Δxk+1Δxk|\displaystyle\leq\left|\Delta x_{k+1}-\Delta x_{k}\right|
+log2(e)ϵ(Pk+ΔPk+|xk+1+Δxk+1xkΔxk|\displaystyle\quad+\frac{\log_{2}(e)}{\epsilon}(P_{k}+\Delta P_{k}+|x_{k+1}+\Delta x_{k+1}-x_{k}-\Delta x_{k}|
Pk+1ΔPk+1)\displaystyle\quad-P_{k+1}-\Delta P_{k+1}) (13c)
=|Δxk+1Δxk|\displaystyle=\left|\Delta x_{k+1}-\Delta x_{k}\right|
+log2(e)ϵ(Pk+|xk+1xk|Pk+1+ΔPkΔPk+1\displaystyle\quad+\frac{\log_{2}(e)}{\epsilon}\!\left(P_{k}\!+\!\left|x_{k+1}-x_{k}\right|-P_{k+1}+\Delta P_{k}-\Delta P_{k+1}\right.
+|xk+1+Δxk+1xkΔxk||xk+1xk|)\displaystyle\quad\left.+\left|x_{k+1}+\Delta x_{k+1}-x_{k}-\Delta x_{k}\right|-\left|x_{k+1}-x_{k}\right|\right) (13d)
|Δxk+1Δxk|\displaystyle\leq\left|\Delta x_{k+1}-\Delta x_{k}\right|
+log2(e)ϵ||xk+1+Δxk+1xkΔxk|\displaystyle\quad+\frac{\log_{2}(e)}{\epsilon}\bigg{|}|x_{k+1}+\Delta x_{k+1}-x_{k}-\Delta x_{k}| (13e)
|xk+1xk||\displaystyle\hskip 56.9055pt-|x_{k+1}-x_{k}|\bigg{|}
+log2(e)ϵ|ΔPk+1ΔPk|\displaystyle\quad+\frac{\log_{2}(e)}{\epsilon}\left|\Delta P_{k+1}-\Delta P_{k}\right| (13f)
(1+log2(e)ϵ)|Δxk+1Δxk|\displaystyle\leq\left(1+\frac{\log_{2}(e)}{\epsilon}\right)\left|\Delta x_{k+1}-\Delta x_{k}\right|
+log2(e)ϵ|ΔPk+1ΔPk|\displaystyle\quad+\frac{\log_{2}(e)}{\epsilon}\left|\Delta P_{k+1}-\Delta P_{k}\right| (13g)

In step (13b), we have used the fact that the difference between the two logarithmic terms is positive, because of the hypothesis (12a). In step (13c), we used the fact that log2alog2b2log2(e)ϵ(ab)\log_{2}a-\log_{2}b\leq\frac{2\log_{2}(e)}{\epsilon}(a-b), for a>bϵ2a>b\geq\frac{\epsilon}{2}.

Case 3: Next, we consider the case with

𝒟info(xk,xk+1,Pk,Pk+1)\displaystyle\mathcal{D}_{\text{info}}(x_{k}^{\prime},x_{k+1}^{\prime},P_{k}^{\prime},P_{k+1}^{\prime}) =0 and\displaystyle=0\text{ and } (14a)
𝒟info(xk,xk+1,Pk,Pk+1)\displaystyle\mathcal{D}_{\text{info}}(x_{k},x_{k+1},P_{k},P_{k+1}) >0.\displaystyle>0. (14b)

The first hypothesis (14a) implies:

Pk+ΔPk+|xk+1+Δxk+1xkΔxk|\displaystyle P_{k}+\Delta P_{k}+|x_{k+1}+\Delta x_{k+1}-x_{k}-\Delta x_{k}|
Pk+1ΔPk+1\displaystyle-P_{k+1}-\Delta P_{k+1} 0.\displaystyle\leq 0. (15)

Using

|xk+1xk|\displaystyle\left|x_{k+1}-x_{k}\right| |Δxk+1Δxk|\displaystyle-\left|\Delta x_{k+1}-\Delta x_{k}\right|
|xk+1+Δxk+1xkΔxk|,\displaystyle\leq\left|x_{k+1}+\Delta x_{k+1}-x_{k}-\Delta x_{k}\right|,

one can deduce from (10) that:

Pk+|xk+1xk|Pk+1ΔPk+1ΔPk+|Δxk+1Δxk||ΔPk+1ΔPk|+|Δxk+1Δxk|.\begin{split}&P_{k}+|x_{k+1}-x_{k}|-P_{k+1}\\ &\leq\Delta P_{k+1}-\Delta P_{k}+|\Delta x_{k+1}-\Delta x_{k}|\\ &\leq|\Delta P_{k+1}-\Delta P_{k}|+|\Delta x_{k+1}-\Delta x_{k}|.\end{split} (16)

This results in:

|𝒟(xk,xk+1,Pk,Pk+1)𝒟(xk,xk+1,Pk,Pk+1)|\displaystyle|\mathcal{D}(x^{\prime}_{k},x^{\prime}_{k+1},P^{\prime}_{k},P^{\prime}_{k+1})-\mathcal{D}(x_{k},x_{k+1},P_{k},P_{k+1})|
=||xk+1Δxk+1xkΔxk||xk+1xk|\displaystyle=\bigg{|}\left|x_{k+1}-\Delta x_{k+1}-x_{k}-\Delta x_{k}\right|-\left|x_{k+1}-x_{k}\right|
12log2(Pk+|xk+1xk|)+12log2(Pk+1)|\displaystyle\quad-\frac{1}{2}\log_{2}(P_{k}+|x_{k+1}-x_{k}|)+\frac{1}{2}\log_{2}(P_{k+1})\bigg{|} (17a)
|Δxk+1Δxk|\displaystyle\leq\left|\Delta x_{k+1}-\Delta x_{k}\right|
+12log2(Pk+|xk+1xk|)12log2(Pk+1)\displaystyle\quad+\frac{1}{2}\log_{2}(P_{k}+|x_{k+1}-x_{k}|)-\frac{1}{2}\log_{2}(P_{k+1}) (17b)
|Δxk+1Δxk|+log2(e)2ϵ(Pk+|xk+1xk|)Pk+1)\displaystyle\leq\left|\Delta x_{k+1}-\Delta x_{k}\right|+\frac{\log_{2}(e)}{2\epsilon}(P_{k}+|x_{k+1}-x_{k}|)-P_{k+1}) (17c)
(1+log2(e)2ϵ)|Δxk+1Δxk|\displaystyle\leq\left(1+\frac{\log_{2}(e)}{2\epsilon}\right)\left|\Delta x_{k+1}-\Delta x_{k}\right|
+log2(e)2ϵ|ΔPk+1ΔPk|\displaystyle\quad+\frac{\log_{2}(e)}{2\epsilon}\left|\Delta P_{k+1}-\Delta P_{k}\right| (17d)

In (17b) we used the fact that the difference between the two logarithmic terms is positive. In (17c) we used the fact that log2alog2blog2(e)ϵ(ab)\log_{2}a-\log_{2}b\leq\frac{\log_{2}(e)}{\epsilon}(a-b), for a>bϵa>b\geq\epsilon. Finally, the inequality (16) was used in step (17d).

Case 4: Finally, we consider the case with

𝒟info(xk,xk+1,Pk,Pk+1)\displaystyle\mathcal{D}_{\text{info}}(x_{k}^{\prime},x_{k+1}^{\prime},P_{k}^{\prime},P_{k+1}^{\prime}) =0 and\displaystyle=0\text{ and }
𝒟info(xk,xk+1,Pk,Pk+1)\displaystyle\mathcal{D}_{\text{info}}(x_{k},x_{k+1},P_{k},P_{k+1}) =0.\displaystyle=0.

In this case:

|𝒟(xk,xk+1,Pk,Pk+1)𝒟(xk,xk+1,Pk,Pk+1)|=||xk+1Δxk+1xkΔxk||xk+1xk|||Δxk+1Δxk|\begin{split}&|\mathcal{D}(x^{\prime}_{k},x^{\prime}_{k+1},P^{\prime}_{k},P^{\prime}_{k+1})-\mathcal{D}(x_{k},x_{k+1},P_{k},P_{k+1})|\\ &\quad=\big{|}\left|x_{k+1}-\Delta x_{k+1}-x_{k}-\Delta x_{k}\right|-|x_{k+1}-x_{k}|\big{|}\\ &\quad\leq|\Delta x_{k+1}-\Delta x_{k}|\end{split} (18)

To summarize, (11), (13g), (17d), and (18) are sufficient to be able to choose Lϵ=max{1+log2(e)ϵ,log2(e)ϵ2}L_{\epsilon}=\max\{1+\frac{\log_{2}(e)}{\epsilon},\frac{\log_{2}(e)}{\epsilon^{2}}\} to obtain the desired result. ∎

Proof of Theorem 1
Suppose γ(t)=(x(t),P(t))\gamma(t)=(x(t),P(t)) and γ(t)=(x(t),P(t))\gamma^{\prime}(t)=(x^{\prime}(t),P^{\prime}(t)). In what follows, we consider the choice:

δ=min{ϵ02Lϵ(1+|γ|TV),ϵ2}\delta=\min\left\{\frac{\epsilon_{0}}{2L_{\epsilon}\left(1+|\gamma|_{\text{TV}}\right)},\frac{\epsilon}{2}\right\} (19)

Since |γγ|TV<δ|\gamma^{\prime}-\gamma|_{\text{TV}}<\delta, we have γγ<δ\|\gamma^{\prime}-\gamma\|_{\infty}<\delta. In particular, for each t[0,T]t\in[0,T], we have |x(t)x(t)|<δ|x^{\prime}(t)-x(t)|<\delta and |P(t)P(t)|<δ|P^{\prime}(t)-P(t)|<\delta. Moreover:

|P(t)|=|P(t)+P(t)P(t)||P(t)||P(t)P(t)|>2ϵδ>ϵ.\begin{split}|P^{\prime}(t)|&=|P(t)+P^{\prime}(t)-P(t)|\\ &\geq|P(t)|-|P^{\prime}(t)-P(t)|>2\epsilon-\delta>\epsilon.\end{split}

Therefore, t[0,T]\forall t\in[0,T], we have both P(t)>ϵP(t)>\epsilon and P(t)>ϵP^{\prime}(t)>\epsilon. Let 𝒫=(0=t0<t1<<tN=T)\mathcal{P}=(0=t_{0}<t_{1}<\cdots<t_{N}=T) be a partition and define:

c(γ;𝒫):=k=0N1𝒟(x(tk),x(tk+1),P(tk),P(tk+1)).c(\gamma;\mathcal{P}):=\sum_{k=0}^{N-1}\mathcal{D}(x(t_{k}),x(t_{k+1}),P(t_{k}),P(t_{k+1})).

For any partition 𝒫\mathcal{P}, the following chain of inequalities holds:

|c(γ;𝒫)c(γ;𝒫)|\displaystyle|c(\gamma^{\prime};\mathcal{P})-c(\gamma;\mathcal{P})|
=|k=0N1𝒟(x(tk),x(tk+1),P(tk),P(tk+1))\displaystyle=\left|\sum_{k=0}^{N-1}\mathcal{D}(x^{\prime}(t_{k}),x^{\prime}(t_{k+1}),P^{\prime}(t_{k}),P^{\prime}(t_{k+1}))\right.
𝒟(x(tk),x(tk+1),P(tk),P(tk+1))|\displaystyle\qquad\qquad-\mathcal{D}(x(t_{k}),x(t_{k+1}),P(t_{k}),P(t_{k+1}))\Bigg{|} (20a)
k=0N1|𝒟(x(tk),x(tk+1),P(tk),P(tk+1))\displaystyle\leq\sum_{k=0}^{N-1}\left|\mathcal{D}(x^{\prime}(t_{k}),x^{\prime}(t_{k+1}),P^{\prime}(t_{k}),P^{\prime}(t_{k+1}))\right.
𝒟(x(tk),x(tk+1),P(tk),P(tk+1))|\displaystyle\left.\qquad\qquad-\mathcal{D}(x(t_{k}),x(t_{k+1}),P(t_{k}),P(t_{k+1}))\right| (20b)
Lϵk=0N1[|(x(tk+1)x(tk+1))(x(tk)x(tk))|.\displaystyle\leq L_{\epsilon}\sum_{k=0}^{N-1}\Bigl{[}|(x^{\prime}(t_{k+1})-x(t_{k+1}))-(x^{\prime}(t_{k})-x(t_{k}))|\Bigr{.}
+|(P(tk+1)P(tk+1))(P(tk)P(tk))|\displaystyle\qquad\left.+|(P^{\prime}(t_{k+1})-P(t_{k+1}))-(P^{\prime}(t_{k})-P(t_{k}))|\right.
+.δ|P(tk+1)P(tk)|+δ|x(tk+1)x(tk)|]\displaystyle\qquad+\Bigl{.}\delta|P(t_{k+1})-P(t_{k})|+\delta|x(t_{k+1})-x(t_{k})|\Bigr{]} (20c)
=Lϵ(V(γγ,𝒫)+δV(γ,𝒫))\displaystyle=L_{\epsilon}(V(\gamma^{\prime}-\gamma,\mathcal{P})+\delta V(\gamma,\mathcal{P})) (20d)
Lϵ(|γγ|TV+δ|γ|TV)\displaystyle\leq L_{\epsilon}\left(|\gamma^{\prime}-\gamma|_{\text{TV}}+\delta|\gamma|_{\text{TV}}\right) (20e)
<Lϵ(δ+δ|γ|TV)\displaystyle<L_{\epsilon}\left(\delta+\delta|\gamma|_{\text{TV}}\right) (20f)
Lϵ(1+|γ|TV)(ϵ02Lϵ(1+|γ|TV))\displaystyle\leq L_{\epsilon}\left(1+|\gamma|_{\text{TV}}\right)\left(\frac{\epsilon_{0}}{2L_{\epsilon}(1+|\gamma|_{\text{TV}})}\right) (20g)
=ϵ02\displaystyle=\frac{\epsilon_{0}}{2} (20h)

The inequality (20c) follows from Lemma 2. Let {𝒫i}i\{\mathcal{P}_{i}\}_{i\in\mathbb{N}} and {𝒫i}i\{\mathcal{P}^{\prime}_{i}\}_{i\in\mathbb{N}} be sequences of partitions such that:

limic(γ;𝒫i)=c(γ),limic(γ;𝒫i)=c(γ),\lim_{i\rightarrow\infty}c(\gamma;\mathcal{P}_{i})=c(\gamma),\;\;\lim_{i\rightarrow\infty}c(\gamma^{\prime};\mathcal{P}^{\prime}_{i})=c(\gamma^{\prime}), (21)

and let {𝒫i′′}i\{\mathcal{P}^{\prime\prime}_{i}\}_{i\in\mathbb{N}} be the sequence of partitions such that for each ii\in\mathbb{N}, 𝒫i′′\mathcal{P}^{\prime\prime}_{i} is a common refinement of 𝒫i\mathcal{P}_{i} and 𝒫i\mathcal{P}^{\prime}_{i}. Since both

c(γ;𝒫i)c(γ;𝒫i′′)c(γ)andc(γ;𝒫i)c(γ;𝒫i′′)c(γ)\displaystyle c(\gamma;\mathcal{P}_{i})\!\leq\!c(\gamma;\mathcal{P}^{\prime\prime}_{i})\leq c(\gamma)\;\text{and}\;c(\gamma^{\prime};\mathcal{P}^{\prime}_{i})\leq c(\gamma;\mathcal{P}^{\prime\prime}_{i})\leq c(\gamma)

hold for each ii\in\mathbb{N}, (21) implies

limic(γ;𝒫i′′)=c(γ),limic(γ;𝒫i′′)=c(γ).\lim_{i\rightarrow\infty}c(\gamma;\mathcal{P}^{\prime\prime}_{i})=c(\gamma),\;\;\lim_{i\rightarrow\infty}c(\gamma^{\prime};\mathcal{P}^{\prime\prime}_{i})=c(\gamma^{\prime}). (22)

Now, since the chain of inequalities (20) holds for any partitions,

|c(γ;𝒫i′′)c(γ;𝒫i′′)|<ϵ02|c(\gamma;\mathcal{P}^{\prime\prime}_{i})-c(\gamma^{\prime};\mathcal{P}^{\prime\prime}_{i})|<\frac{\epsilon_{0}}{2}

holds for all ii\in\mathbb{N}. This results in:

|c(γ)c(γ)|\displaystyle|c(\gamma)-c(\gamma^{\prime})| =limi|c(γ;𝒫i′′)c(γ;𝒫i′′)|\displaystyle=\lim_{i\rightarrow\infty}|c(\gamma;\mathcal{P}^{\prime\prime}_{i})-c(\gamma^{\prime};\mathcal{P}^{\prime\prime}_{i})|
ϵ02<ϵ0.\displaystyle\leq\frac{\epsilon_{0}}{2}<\epsilon_{0}. (23)

where (B) follows from (22).

Appendix C One-Dimensional Problem Optimal Path

Consider taking the single perception optimal path γ1\gamma_{1} in Fig. 3 (a):

(x0,P0)(xT,PT)(x_{0},P_{0})\rightarrow(x_{T},P_{T})

and dividing it into the combination of two sub-paths γ2\gamma_{2}:

(x0,P0)(xa,Pa)(xT,PT)such thatP^0=P0+βxTx0W>Pa,P^a=Pa+(1β)xTx0W>PT\begin{split}&(x_{0},P_{0})\rightarrow(x_{a},P_{a})\rightarrow(x_{T},P_{T})\\ &\text{such that}\;\;\hat{P}_{0}^{{}^{\prime}}=P_{0}+\beta\|x_{T}-x_{0}\|W>P_{a},\\ &\qquad\qquad\hat{P}_{a}^{{}^{\prime}}=P_{a}+(1-\beta)\|x_{T}-x_{0}\|W>P_{T}\end{split}

where β(0,1)\beta\in(0,1) is a constant which denotes where in γ2\gamma_{2} the additional sensing action takes place. The combination of the divided sub-paths have in the same initial (z0z_{0}) and ending (zTz_{T}) states as the original path, but also achieve an intermediate state (zaz_{a}).

Path γ1\gamma_{1} has a total RI cost:

𝒟(γ1)\displaystyle\mathcal{D}(\gamma_{1}) =xTx0+α2[log2P^0log2PT],\displaystyle=\|x_{T}-x_{0}\|+\frac{\alpha}{2}\left[\log_{2}\hat{P}_{0}-\log_{2}P_{T}\right],

where, P^0=P0+xTx0W\hat{P}_{0}=P_{0}+\|x_{T}-x_{0}\|W. Likewise, the path γ2\gamma_{2} has a length which is the summation of two information gains and while transitioning the same distance as γ1\gamma_{1}.

𝒟(γ2)=xTx0+α2[log2P^0log2Pa]\displaystyle\mathcal{D}(\gamma_{2})=\|x_{T}-x_{0}\|+\frac{\alpha}{2}\left[\log_{2}\hat{P}_{0}^{{}^{\prime}}-\log_{2}P_{a}\right]
+α2[log2P^alog2PT]\displaystyle\qquad\qquad+\frac{\alpha}{2}\left[\log_{2}\hat{P}_{a}^{{}^{\prime}}-\log_{2}P_{T}\right]

By comparing the costs between γ1\gamma_{1} and γ2\gamma_{2}, it is possible to achieve:

𝒟(γ1)𝒟(γ2)=α2[log2P^0log2PT]\displaystyle\mathcal{D}(\gamma_{1})-\mathcal{D}(\gamma_{2})=\frac{\alpha}{2}\left[\log_{2}\hat{P}_{0}-\log_{2}P_{T}\right]
α2[log2P^0log2Pa]α2[log2P^alog2PT]\displaystyle\qquad-\frac{\alpha}{2}\left[\log_{2}\hat{P}_{0}^{{}^{\prime}}-\log_{2}P_{a}\right]-\frac{\alpha}{2}\left[\log_{2}\hat{P}_{a^{\prime}}-\log_{2}P_{T}\right]
=α2[log2P^0log2P^0]α2[log2P^alog2Pa]\displaystyle=\frac{\alpha}{2}\left[\log_{2}\hat{P}_{0}-\log_{2}\hat{P}_{0}^{{}^{\prime}}\right]-\frac{\alpha}{2}\left[\log_{2}\hat{P}_{a^{\prime}}-\log_{2}P_{a}\right]
=f(P^0)f(Pa)<0,\displaystyle=f(\hat{P}_{0}^{{}^{\prime}})-f(P_{a})<0,

where f(P)=α2[log2(P+(1β)xTx0W)log2P]f(P)\!=\!\frac{\alpha}{2}\left[\log_{2}(P+(1-\beta)\|x_{T}-x_{0}\|W)\!-\!\log_{2}P\right]. Last inequality follows the facts that P^0>Pa\hat{P}_{0}^{{}^{\prime}}>P_{a} and f(P)f(P) is a decreasing function (dfdP<0\derivative{f}{P}<0).

References

  • [1] S. Pendleton, H. Andersen, X. Du, X. Shen, M. Meghjani, Y. Eng, D. Rus, and M. Ang, “Perception, planning, control, and coordination for autonomous vehicles,” Machines, vol. 5, no. 1, p. 6, 2017.
  • [2] M. Pfeiffer, M. Schaeuble, J. Nieto, R. Siegwart, and C. Cadena, “From perception to decision: A data-driven approach to end-to-end motion planning for autonomous ground robots,” in Proc. IEEE Int. Conf. Robot. Autom., 2017, pp. 1527–1533.
  • [3] L. Carlone and S. Karaman, “Attention and anticipation in fast visual-inertial navigation,” IEEE Trans. Robot., vol. 35, no. 1, pp. 1–20, 2019.
  • [4] R. Alterovitz, S. Koenig, and M. Likhachev, “Robot planning in the real world: research challenges and opportunities,” AI Magazine, vol. 37, no. 2, pp. 76–84, 2016.
  • [5] Y. Kuwata, J. Teo, S. Karaman, G. Fiore, E. Frazzoli, and J. How, “Motion planning in complex environments using closed-loop prediction,” AIAA Guid. Navi. Control Conf. Exhibit, 2008.
  • [6] J. Van Den Berg, P. Abbeel, and K. Goldberg, “LQG-MP: Optimized path planning for robots with motion uncertainty and imperfect state information,” Int. J. Robot. Res., vol. 30, no. 7, pp. 895–913, 2011.
  • [7] A.-A. Agha-Mohammadi, S. Chakravorty, and N. M. Amato, “FIRM: Sampling-based feedback motion-planning under motion uncertainty and imperfect measurements,” Int. J. Robot. Res., vol. 33, no. 2, pp. 268–304, 2014.
  • [8] C. A. Sims, “Implications of rational inattention,” J. Monetary Economics, vol. 50, no. 3, pp. 665–690, 2003.
  • [9] E. Shafieepoorfard, M. Raginsky, and S. P. Meyn, “Rationally inattentive control of markov processes,” SIAM J. Control Optimization, vol. 54, no. 2, pp. 987–1016, 2016.
  • [10] E. Shafieepoorfard and M. Raginsky, “Rational inattention in scalar lqg control,” in Proc. Conf. Decision Control, 2013, pp. 5733–5739.
  • [11] S. M. LaValle, Planning algorithms.   Cambridge University Press, 2006.
  • [12] S. Karaman and E. Frazzoli, “Incremental sampling-based algorithms for optimal motion planning,” in Proc. Robot.: Sci. Syst., 2010.
  • [13] J. J. Marquez and M. L. Cummings, “Design and evaluation of path planning decision support for planetary surface exploration,” J. Aerosp. Comput., Info., Comm., vol. 5, no. 3, pp. 57–71, 2008.
  • [14] Y. K. Hwang and N. Ahuja, “A potential field approach to path planning,” IEEE Trans. Robot. Autom., vol. 8, no. 1, pp. 23–32, 1992.
  • [15] S. Kambhampati and L. Davis, “Multiresolution path planning for mobile robots,” IEEE J. Robot. Autom., vol. 2, no. 3, pp. 135–145, 1986.
  • [16] F. Hauer, A. Kundu, J. M. Rehg, and P. Tsiotras, “Multi-scale perception and path planning on probabilistic obstacle maps,” in Proc. IEEE Int. Conf. Robot. Autom., 2015, pp. 4210–4215.
  • [17] A. Lambert and D. Gruyer, “Safe path planning in an uncertain-configuration space,” in Proc. IEEE Int. Conf. Robot. Autom., vol. 3, 2003, pp. 4185–4190.
  • [18] R. Pepy and A. Lambert, “Safe path planning in an uncertain-configuration space using RRT,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., 2006, pp. 5376–5381.
  • [19] P. Kloeden and E. Platen, Numerical Solution of Stochastic Differential Equations.   Berlin: Springer, 1992.
  • [20] A. Lambert and D. Gruyer, “Safe path planning in an uncertain-configuration space,” in Proc. IEEE Int. Conf. Robot. Autom., vol. 3, 2003, pp. 4185–4190.
  • [21] R. Pepy and A. Lambert, “Safe path planning in an uncertain-configuration space using RRT,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., 2006, pp. 5376–5381.
  • [22] T. Tanaka, K.-K. Kim, P. A. Parrilo, and S. K. Mitter, “Semidefinite programming approach to Gaussian sequential rate-distortion trade-offs,” IEEE Trans. Automat. Control, vol. 62, no. 4, pp. 1896–1910, 2016.
  • [23] L. Vandenberghe, S. Boyd, and S.-P. Wu, “Determinant maximization with linear matrix inequality constraints,” SIAM J. Matrix Analysis and Applications, vol. 19, no. 2, pp. 499–533, 1998.
  • [24] N. L. Carothers, Real analysis.   Cambridge University Press, 2000.
  • [25] S. M. LaValle and J. J. Kuffner, “Randomized kinodynamic planning,” Int. J. Robot. Res., vol. 20, no. 5, pp. 378–400, May 2001.
  • [26] S. Karaman, M. R. Walter, A. Perez, E. Frazzoli, and S. Teller, “Anytime motion planning using the RRT,” in Proc. IEEE Int. Conf. Robot. Autom., 2011, pp. 1478–1483.
  • [27] S. Karaman and E. Frazzoli, “Sampling-based algorithms for optimal motion planning,” Int. J. Robot. Res., vol. 30, no. 7, pp. 846–894, 2011.