Incrementally Stochastic and Accelerated Gradient Information mixed Optimization for Manipulator Motion Planning

Yichang Feng, Jin Wang^∗, Haiyun Zhang, Guodong Lu Manuscript received March 24, 2022; Accepted June 26, 2022. This paper was recommended for publication by Editor Kurniawati, Hanna upon evaluation of the Associate Editor and Reviewers’ comments. This work was supported by the Key R&D Program of Zhejiang Province (2020C01025, 2020C01026), the National Natural Science Foundation of China (52175032), and Robotics Institute of Zhejiang University Grant (K11808). The authors are with 1) State Key Laboratory of Fluid Power and Mechatronic Systems, School of Mechanical Engineering, Zhejiang University, Hangzhou 310027, China; 2) Engineering Research Center for Design Engineering and Digital Twin of Zhejiang Province, School of Mechanical Engineering, Zhejiang University, Hangzhou 310027, China. ^∗Jin Wang, corresponding author, dwjcom@zju.edu.cnDigital Object Identifier (DOI): see top of this page.

Abstract

This paper introduces a novel motion planner, incrementally stochastic and accelerated gradient information mixed optimization (iSAGO), for robotic manipulators in a narrow workspace. Primarily, we propose the overall scheme of iSAGO informed by the mixed momenta for an efficient constrained optimization based on the penalty method. In the stochastic part, we generate the adaptive stochastic momenta via the random selection of sub-functionals based on the adaptive momentum (Adam) method to solve the body-obstacle stuck case. Due to the slow convergence of the stochastic part, we integrate the accelerated gradient descent (AGD) to improve the planning efficiency. Moreover, we adopt the Bayesian tree inference (BTI) to transform the whole trajectory optimization (SAGO) into an incremental sub-trajectory optimization (iSAGO), which improves the computation efficiency and success rate further. Finally, we tune the key parameters and benchmark iSAGO against the other 5 planners on LBR-iiwa on a bookshelf and AUBO-i5 on a storage shelf. The result shows the highest success rate and moderate solving efficiency of iSAGO.

Index Terms:

Constrained Motion Planning, Collision Avoidance, Manipulation Planning

I INTRODUCTION

The industrial manufactory has widely applied robots in various areas such as welding and product loading or placing. Most of them always execute the above tasks under manual teaching or programming. So automatic production urges an efficient and robust optimal motion planning (OMP) algorithm to elevate industrial intelligence.

Though former OMP studies gain a collision-free optimal trajectory for a safe and smooth motion by numerical optimization [1, 2, 3, 4] or probabilistic sampling [5, 6, 7, 8, 9, 10], there still exist two main concerns this paper aims to solve:

(i) Reliability: The numerical methods, such as CHOMP[1], GPMP[2], and TrajOpt[3], can rapidly converge to a minimum with descent steps informed by the deterministic momenta/gradients. However, the local minima (i.e., failure plannings) are unavoidable with an inappropriate initial point. That is because the momenta information only describes the manifold of a local space near the initial point. So it is difficult for them to plan a safe motion with a high success rate (i.e., high reliability) in a narrow space.

(ii) Efficiency: The sampling method like STOMP[8] directly samples the trajectories to gain the optima. Others like RRT-Connect [7] grow a searching tree by the randomly sampled waypoints. Their sampling and wiring process can generate safe trajectories free of manifold information. However, their efficiency highly depends on the proportion the feasible subspace takes of the overall searching space. So the indiscriminate process takes enormous computation resources (i.e., low efficiency) in a narrow space.

Refer to caption — Figure 1: A block diagram illustrates iSAGO (Algorithm 1, Section IV-A) that plans a collision-free smooth motion, whose outer layer shows how to refine a whole trajectory via BTI-based sub-trajectories optimization. The PenIter integrates AGD (Algorithm 2, Section IV-B) and STOMA (Algorithm 3, Section IV-C) to solve the non-convex OMP by detecting the stuck case.

This paper proposes iSAGO (Figure 1), which integrates the Stochastic and Accelerated information of Gradients and builds a Bayes tree for an incremental Optimization:

(i) To address reliability, Stochastic trajectory optimization with moment adaptation (STOMA, Section IV-C) overcomes the local minima from the body-obstacle stuck cases by randomly selecting variables, such as collision-check balls, time intervals, and penalty factors. The information leakage of the stochastic momenta can somewhat modify OMP’s manifold with fewer local minima.

(ii) Considering the low efficiency of STOMA with $\mathcal{O}(\log N/N)$ convergence rate, iSAGO integrates accelerated gradient descent (AGD, Section IV-B) in the non-stuck case because its $\mathcal{O}(1/N^{\frac{3}{2}})$ convergence rate of gradient norm is proven optimal in the first-order convex optimization. Furthermore, iSAGO adopts Bayes tree inference to optimize the trajectory incrementally (Section IV-A) for the further efficiency elevation, which optimizes the convex or non-convex sub-trajectories separately in iSAGO.reTrajOpt.

The experiment of the 15 planning tasks with 44 problems on LBR-iiwa and AUBO-i5 (Section V-B) validates higher reliability of iSAGO than the numerical methods [1, 2, 3] and its higher efficiency than the sampling methods [8, 7].

II RELATED WORKS

II-A Numerical Optimization

The main concern of numerical optimization is rapidly descending to an optimum. CHOMP [1], ITOMP [11], and GPMP [2] adopt the gradient descent method with the fixed step size. CHOMP uses Hamiltonian Monte Carlo (HMC) [12] for success rate improvement. To lower the computational cost, GPMP and dGPMP [4] adopt iSAM2 [13] to do incremental planning, and each sub-planning converges with a super-linear rate via Levenberg–Marquardt (LM) [14] algorithm. Meanwhile, ITOMP interleaves planning with task execution in a short-time period to adapt to the dynamic environment. Moreover, TrajOpt [3] uses the trust-region [15] method to improve efficiency. It is also adopted by GOMP [16] for grasp-optimized motion planning with multiple warm restarts learned from a deep neural network. Instead of deep learning, ISIMP [17] interleaves sampling and interior-point optimization for planning. Nevertheless, the above methods may converge to local minima when the objective function is not strongly convex.

AGD [18, 19] has recently been developed and implemented on the optimal control [20, 21] of a dynamic system described by differential equations whenever the objective is strongly convex or not. Moreover, stochastic optimization employs the momentum theorems of AGD and solves the large-scale semi-convex cases [22], which generates stochastic sub-gradient via the random sub-datasets selection. Adam [23] upgrades RMSProp [24] and introduces an exponential moving average (EMA) for momentum adaptation. Furthermore, [25] adopts Adam to overcome the disturbance of a UAV system.

II-B Probabilistic Sampling

Unlike the numerical method, the sampling method constructs a search graph to query feasible solutions or iteratively samples trajectories for motion planning.

PRM [5] and its asymptotically-optimal variants like PRM* [9] and RGGs [26] make a collision-free connection among the feasible vertexes to construct a roadmap. Then they construct an optimal trajectory via shortest path (SP) algorithms like Dijkstra [27] and Chehov [28], which store and query partial trajectories efficiently. Unlike PRM associated with SP, RRT [6] and its asymptotically-optimal variants like RRT* [9] and CODEs [10] find a feasible solution by growing rapidly-exploring random trees (RRTs).

STOMP [8] resamples trajectory obeying Gaussian distribution renewed by the important samples. [29] introduces the GPMP’s function to improve the searching efficiency of STOMP. Moreover, SMTO [30] applies Monte-Carlo optimization for multi-solution and refines them numerically.

III PROBLEM FORMULATION

III-A Objective functional

We adopt the probabilistic inference model of GPMP [2] to infer a collision-free optimal trajectory given an environment with obstacles, a manipulator’s arm, start, and goal.

III-A1 GP prior

Gaussian process (GP) describes a multi-variant Gaussian distribution of trajectory $\bm{\theta}\backsim\mathcal{GP}(\bm{\mu},\bm{\mathcal{K}})$ . $\bm{\mu}_{\mathbf{T}}=[\mu_{i}^{\top}]^{\top}\big{|}_{i\in\mathbf{T}}$ and $\bm{\mathcal{K}}_{\mathbf{T}}=[\mathcal{K}_{i,j}]\big{|}_{i,j\in\mathbf{T}}$ determine its expectation value and covariance matrix of trajectory $\bm{\theta}_{\mathbf{T}}=[\theta_{i}^{\top}]^{\top}\big{|}_{i\in\mathbf{T}}$ , respectively, where a set $\mathbf{T}=\{t_{1},\dots,t_{2}\}$ denotes a period of time from $t_{1}$ to $t_{2}$ . Since the above distribution originates from Gauss–Markov model (GMM)¹¹1Section 4 of [2] introduces GP-prior, GMM generated by a linear time-varying stochastic differential equation (LTV-SDE), and GP-interpolation utilized by Section 5 for up-sampling. , informed by the start and goal under zero acceleration assumption, we define the smooth functional

\mathcal{F}_{gp}(\bm{\theta}_{\mathbf{T}})=\frac{1}{2}\left(\bm{\theta}_{\mathbf{T}}-\bm{\mu}_{\mathbf{T}}\right)^{\top}\bm{\mathcal{K}_{\mathbf{T}}}^{-1}\left(\bm{\theta}_{\mathbf{T}}-\bm{\mu}_{\mathbf{T}}\right).

(1)

III-A2 Collision avoidance

Since the collision-free trajectory means a safe motion or successful planning, we first utilize $\bm{x}(\mathcal{B}_{i},t)$ to map from a state $\theta_{t}$ at time $t$ to the position state of a collision-check ball (CCB- $\mathcal{B}_{i}$ ) on the manipulator. Then we calculate the collision cost $c(\bm{x})$ , which increases when the distance between $\mathcal{B}_{i}$ and its closest obstacle decreases, and define the obstacle functional

\mathcal{F}_{obs}(\bm{\theta}_{\mathbf{T}})=\sum_{t\in\mathbf{T}}\sum_{\mathcal{B}_{i}\in\bm{\mathcal{B}}}c\left[\bm{x}(\mathcal{B}_{i},t)\right]\cdot\left\|\dot{\bm{x}}(\mathcal{B}_{i},t)\right\|,

(2)

where $\bm{\mathcal{B}}=\{\mathcal{B}_{1},\dots,\mathcal{B}_{|\bm{\mathcal{B}}|}\}$ consists of $|\bm{\mathcal{B}}|$ CCBs.

(i) Stuck case and local minima: The functional (2) consists of discrete CCBs for calculating the cost and gradient. So there may exist the sub-gradients with opposite directions in a narrow workspace. In this case, the gradient of (2) will approach zero when a collision cost is still high. It is a so-called body-obstacle stuck case that results in the local minima. According to Section IV-C1 and the objective functional (6), all local minima are unsafe trajectories because the non-convexity originates from the obstacle part (2) caused by the stuck case rather than the convex GP part (1).

(ii) Continuous-time safety: Though the above form combines the collision cost $c(\bm{x})$ with the change rate $\dot{\bm{x}}$ of CCB state for the continuous-time collision avoidance between two adjacent waypoints, it still cannot ensure safety when the obstacle allocation is sparse. This combination at one single time $t$ cannot precisely represent the obstacle functional within a continuous-time interval measured by an infinite number of timestamps, including $t$ . So the up-sampling [2] method is adopted for continuous-time safety. According to GMM, we get $\bm{\Phi}(t+1,t)$ transforming $\theta_{t}$ to $\theta_{t+1}$ and

\bm{\Lambda}_{t,\tau}=\bm{\Phi}(t+\tfrac{\tau}{n_{t}^{\textit{ip}}+1},t)-\bm{\Psi}_{t,\tau}\bm{\Phi}(t+1,t),

(3)

\bm{\Psi}_{t,\tau}=\mathbf{Q}_{t,\tau}\bm{\Phi}(t+1,t+\tfrac{\tau}{n_{t}^{\textit{ip}}+1})^{\top}\mathbf{Q}_{t,1}^{-1}.

(4)

where $\mathbf{Q}_{t,\tau}$ transforms the GP-kernel from $t$ to $(t+\tfrac{\tau}{n_{t}^{\textit{ip}}+1})$ with $\tau\in\{1,\dots,n_{t}^{\textit{ip}}\}$ , and $n_{t}^{\textit{ip}}$ is the number of intervals between two adjacent waypoints $\{\theta_{t},\theta_{t+1}\}$ . Then we utilize $\bm{\Lambda}_{t}=[\bm{\Lambda}_{t,1}^{\top}\ldots\bm{\Lambda}_{t,n_{t}^{\textit{ip}}}^{\top}]^{\top}$ , $\bm{\Psi}_{t}=[\bm{\Psi}_{t,1}^{\top}\ldots\bm{\Psi}_{t,n_{t}^{\textit{ip}}}^{\top}]^{\top}$ to form the upsampling matrix of trajectory $\bm{\theta}_{\mathbf{T}}$ in the period $\mathbf{T}$ :

{\mathbf{M}=\left[\begin{array}[]{cccccccc}\mathbf{I}&\mathbf{0}&\mathbf{0}&\dots&\dots&\dots&\mathbf{0}&\mathbf{0}\\ \bm{\Lambda}_{t_{1}}&\bm{\Psi}_{t_{1}}&\mathbf{0}&\dots&\dots&\dots&\mathbf{0}&\mathbf{0}\\ \vdots&\vdots&\ddots&&&&\vdots&\vdots\\ \mathbf{0}&\mathbf{0}&\dots&\mathbf{I}&\mathbf{0}&\dots&\mathbf{0}&\mathbf{0}\\ \mathbf{0}&\mathbf{0}&\dots&\bm{\Lambda}_{t}&\bm{\Psi}_{t}&\dots&\mathbf{0}&\mathbf{0}\\ \mathbf{0}&\mathbf{0}&\dots&\mathbf{0}&\mathbf{I}&\dots&\mathbf{0}&\mathbf{0}\\ \vdots&\vdots&&&&\ddots&\vdots&\vdots\\ \mathbf{0}&\mathbf{0}&\dots&\dots&\dots&\dots&\bm{\Lambda}_{t_{2}\text{-}1}&\bm{\Psi}_{t_{2}\text{-}1}\\ \mathbf{0}&\mathbf{0}&\dots&\dots&\dots&\dots&\mathbf{0}&\mathbf{I}\\ \end{array}\right]}

(5)

for an upsampled trajectory $\bm{\theta}^{\textit{up}}_{\mathbf{T}}=\mathbf{M}(\bm{\theta}_{\mathbf{T}}-\bm{\mu}_{\mathbf{T}})+\bm{\mu}^{\textit{up}}_{\mathbf{T}}$ . Section IV-C1 will detail the generation of a stochastic gradient in the time scale by the random selection of $\{n^{\textit{ip}}_{t}|_{t\in\mathbf{T}}\}$ .

III-A3 Objective functional

This paper adopts the objective functional of [1, 2] to generate an optimal trajectory $\bm{\theta}_{\mathbf{T}}$ in high smoothness without collision:

\mathcal{F}(\bm{\theta}_{\mathbf{T}})=\varrho\mathcal{F}_{gp}(\bm{\theta}_{\mathbf{T}})+\mathcal{F}_{obs}(\bm{\theta}_{\mathbf{T}}^{\textit{up}}),

(6)

where $\varrho$ trades off between the GP prior (smoothness) and obstacle (collision-free) functional. Here we recommend [1, 2] for the detailed derivation of its gradient.

III-B Bayes tree construction

The above has introduced the relation between the stuck case and local minima, how to ensure the continuous-time safety, and how to transform the requirements of collision-free and smoothness to the Lagrangian formed objective functional of $\bm{\theta}_{\mathbf{T}}$ . According to the definition of $\mathbf{T}$ , we can group all timestamps $\{1,\dots,N\}$ into it to optimize the whole trajectory with $N$ waypoints or group some of them $\{t_{1},\dots,t_{2}\}$ for sub-trajectories. These sub-problems can be solved incrementally to elevate the efficiency of trajectory optimization, like iGPMP’s incremental replanning [2].

The Bayes tree (BT) definition in [2] tells us a single chain with conjugated waypoints constructs a BT with a set of nodes $\Theta=\{\theta_{t}|_{t=1\dots N}\}$ and branches $\{[\theta_{t},\theta_{t+1}]|_{t=0\dots N}\}$ . In this way, we utilize (6) to calculate a BT-factor

\mathcal{F}_{t}^{\text{bt}}=\mathcal{F}(\bm{\theta}_{\mathbf{T}})\text{ with }\mathbf{T}=\{t-1,t,t+1\},

(7)

informed by a minimum sub-trajectory. Section IV-A will detail how to utilize it for incremental optimization.

IV METHODOLOGY

IV-A Incremental optimization with mixed steps

Given the start $\theta_{0}$ and goal $\theta_{N+1}$ , iSAGO (Algorithm 1, Figure 1) first interpolates the support waypoints between them via the linear interpolation to gain an initial trajectory. Then it finds a series of collision-free waypoints facing the two planning scenarios: (i) the convex case satisfying

\mathcal{C}_{\theta}=\{\bm{\theta}^{{}^{\prime}}\bigr{|}\tfrac{|\mathcal{F}(\bm{\theta}^{\prime})-\mathcal{F}(\bm{\theta})-\langle\bar{\nabla}\mathcal{F}(\bm{\theta}),\bm{\theta}^{\prime}-\bm{\theta}\rangle|}{\|\bm{\theta}^{\prime}-\bm{\theta}\|^{2}}\leq\tfrac{\mathcal{L}_{\mathcal{F}}}{2}\};

(8)

(ii) the non-convex case where the trajectory gets stuck in obstacles, causing the local minima.

Given these two cases, iSAGO mixes the accelerated and stochastic gradient information. AGD (Section IV-B) solves case (i) with an optimal convergence informed by the first order accelerated gradient. On the other hand, STOMA (Section IV-C) utilizes the stochastic gradient to drag the trajectory from case (ii) into a convex sub-space, i.e., case (i). Moreover, iSAGO uses the penalty method [31] to nest the above methods in PenIter for constrained optimization.

Input : initial

\bm{\theta}

\mathcal{GP}

, BT, and tradeoff

\varrho

Output : optimized trajectory

\bm{\theta}

2for $\textit{reTrajIter}:i=1\dots N_{uf}$ do // incremental OMP

3 Get noisy set

\bm{\Theta}_{\mathfrak{N}}=\bigcup_{j=1}^{|\bm{\Theta}_{\mathfrak{N}}|}\bm{\theta}_{\mathfrak{N}}^{j}

via (10);

4 if $\bm{\Theta}_{\mathfrak{N}}=\emptyset$ then return.

5 for $j=1\dots|\bm{\Theta}_{\mathfrak{N}}|$ do // optimize

\bm{\Theta}_{\mathfrak{N}}

6 Get sub-trajectory

\bm{\theta}_{\mathbf{T}}\leftarrow\bm{\theta}_{\mathfrak{N}}^{j}

with

\mathbf{T}\leftarrow\mathbf{T}^{j}_{\mathfrak{N}}

;

7 for $\textit{PenIter}:1\dots N_{\varrho}$ do // penalty method

8 Check the body-obstacle stuck case according to Section IV-C1;

9 if isStuck then

\bm{\theta}_{\mathbf{T}}\leftarrow\texttt{STOMA($\bm{\theta}_{\mathbf{T}}$,$\mathcal{GP(\bm{\mu}_{\mathbf{T}},\bm{\mathcal{K}}_{\mathbf{T}})}$,$\varrho$)}

;

10 else

\bm{\theta}_{\mathbf{T}}\leftarrow\texttt{AGD($\bm{\theta}_{\mathbf{T}}$,$\mathcal{GP(\bm{\mu}_{\mathbf{T}},\bm{\mathcal{K}}_{\mathbf{T}})}$,$\varrho$)}

;

11 if $\mathcal{F}_{obs}(\bm{\theta})<\textit{obs}\text{tol}$ then break;

12 else

\varrho\leftarrow\kappa_{\varrho}\cdot\varrho

;

14 Update the BT-factors

\{\mathcal{F}^{\text{bt}}_{t}|_{t\in\textbf{T}}\}

;

Algorithm 1 iSAGO

Formally, our mixed method optimizes the whole trajectory to plan a safe motion. However, it is inefficient because the trajectory consists of collision-free and in-collision (no-stuck and in-stuck) parts, each requiring a different method. So we adopt iSAM2 [13] and build a BT (Section III-B) with factors (7). Since an optimal BT has no significant factor, incremental-SAGO (iSAGO) finds the significant factors based on the mean and standard deviation of all factors:

\displaystyle\mu_{\mathcal{F}}=\sum_{t=1\dots N}w_{t}\mathcal{F}_{t}^{\text{bt}},~{}\mathcal{D}_{\mathcal{F}}=\sqrt{\sum_{t=1\dots N}w_{t}\left(\mathcal{F}_{t}^{\text{bt}}-\mu_{\mathcal{F}}\right)^{2}},

(9)

where $w_{t}=1/N$ and $N$ is the number of waypoints. Next, iSAGO gains a set of waypoints with significant factors:

{\Theta}_{\mathfrak{N}}=\left\{\theta_{t}\bigr{|}~{}|\mathcal{F}_{t}^{\text{bt}}-\mu_{\mathcal{F}}|>c_{\eta}\mathcal{D}_{\mathcal{F}},t=1\dots N\right\},

(10)

where a smaller $c_{\eta}$ filters out more factors. Then it divides the whole trajectory into slices, each consisting of the adjacent $\theta\in\Theta_{\mathfrak{N}}$ . After inserting one $\theta\notin\Theta_{\mathfrak{N}}$ at the head and tail of each slice, we get $\bm{\Theta}_{\mathfrak{N}}=\bigcup_{j=1}^{|\bm{\Theta}_{\mathfrak{N}}|}\bm{\theta}_{\mathfrak{N}}^{j}$ and stamp each sub-trajectory $\bm{\theta}_{\mathfrak{N}}^{j}$ by $\mathbf{T}^{j}_{\mathfrak{N}}=\{t\big{|}\theta_{t}\in\bm{\theta}_{\mathfrak{N}}^{j}\}$ . When all sub-trajectories are optimized in iSAGO.reTrajIter, iSAGO will update the BT and $\bm{\Theta}_{\mathfrak{N}}$ for the incremental optimization.

IV-B Accelerated Gradient Descent

This paper gathers the accelerated gradient information for optimization because it only requires the first order momenta to achieve an optimal convergence [32] with the Lipschitz continuous gradient (8). So we adopt the descent rules of [19]:

\alpha_{k}=\frac{2}{k+1},~{}\beta_{k}=\frac{1}{2\mathcal{L}_{\mathcal{F}}},~{}\lambda_{k}=\frac{k\beta_{k}}{2},

(11)

then for $\forall n>1$ , we have²²2See more details in [19]: (a) Lemma 1 provides an analytic view of the damped descent of Ghadimi’s AGD; (b) Theorem 1 provides AGD’s convergence rate, and its proof validates the accelerated descent process.

\min_{k=1\dots n}\|\bar{\nabla}\mathcal{F}(\bm{\theta}^{md}_{k})\|^{2}\leq\frac{96\mathcal{L}_{\mathcal{F}}\|\bm{\theta}^{*}-\bm{\theta}_{0}\|^{2}}{n^{2}(n+1)},

(12)

Some former studies [18, 32, 19] set a fixed Lipschitz constant $\mathcal{L}_{\mathcal{F}}$ for AGD. However, (12) indicates a strong correlation between the super-linear convergence rate and $\mathcal{L}_{\mathcal{F}}$ . Recent studies [20, 33] introduce the restart schemes for speed-up when the problem is not globally convex. Algorithm 2 adopts the trust-region method [15] to adjust $\mathcal{L}_{\mathcal{F}}$ with factor $\kappa_{\tiny{\mathcal{L}}}$ and restart AGD until the condition³³3This paper uses $\mathcal{F}^{md}_{k}$ , $\bar{\nabla}\mathcal{F}^{md}_{k}$ , $\mathcal{F}^{ag}_{k}$ and $\mathcal{F}_{k}$ to simplify $\mathcal{F}(\bm{\theta}^{md}_{k})$ , $\bar{\nabla}\mathcal{F}(\bm{\theta}^{md}_{k})$ , $\mathcal{F}(\bm{\theta}^{ag}_{k})$ and $\mathcal{F}(\bm{\theta}_{k})$ , while $\mathcal{F}^{\text{bt}}_{i}$ for the BT-factor at point $\theta_{i}$ .

|\mathcal{F}_{k}-\mathcal{F}_{k-1}-\langle\bar{\nabla}\mathcal{F}_{k}^{md},\bm{\theta}_{k}-\bm{\theta}_{k-1}\rangle|\leq\tfrac{\mathcal{L}_{\mathcal{F}}}{2}\|\bm{\theta}_{k}-\bm{\theta}_{k-1}\|^{2}

(13)

is satisfied. In this way, we arbitrarily initialize $\mathcal{L}_{\mathcal{F}}$ as $\updelta_{0}\|\bar{\nabla}{\mathcal{F}}(\bm{\theta}_{0})\|$ and update it for a robust convergence.

Input : initial

\bm{\theta}_{0}

, penalty

{\varrho}

, and

\mathcal{GP}(\bm{\mu},\bm{\mathcal{K}})

Output : optimized

\bm{\theta}^{md}

1 Initialize:

\bm{\theta}_{0}^{ag}=\bm{\theta}_{0}

\bar{\nabla}\mathcal{F}_{1}^{md}=\bar{\nabla}\mathcal{F}_{0}

\mathcal{L}_{\mathcal{F}}=\updelta_{0}\|\bar{\nabla}{\mathcal{F}}_{0}\|

;

2 for $\textit{$\mathcal{L}$Iter}:j=1\dots N_{\mathcal{L}}$ do // adjust

\mathcal{L}_{\mathcal{F}}

3 for $\textit{AgdIter}:k=1\dots N_{ag}$ do // AGD

\bm{\theta}_{k}^{md}=\left(1-\alpha_{k}\right)\bm{\theta}_{k-1}^{ag}+\alpha_{k}\bm{\theta}_{k-1}

;

5 if $k\geq 2$ then

\bar{\nabla}\mathcal{F}_{k}^{md}\leftarrow\bar{\nabla}\mathcal{F}(\bm{\theta}_{k}^{md})

;

6 Update

\alpha_{k}

\beta_{k}

and

\lambda_{k}

by (11);

\bm{\theta}_{k}=\bm{\theta}_{k-1}-\lambda_{k}\bar{\nabla}\mathcal{F}_{k}^{md}

;

\bm{\theta}_{k}^{ag}=\bm{\theta}_{k}^{md}-\beta_{k}\bar{\nabla}\mathcal{F}_{k}^{md}

;

9 if Converged (14) or isStuck then return.

10 if condition (13) is not satisfied then

\left\{\bm{\theta}^{md}_{0},\bm{\theta}^{ag}_{0},\bm{\theta}_{0}\right\}\leftarrow\bm{\theta}^{md}_{k-1}

;

\mathcal{L}_{\mathcal{F}}\leftarrow\kappa_{\tiny{\mathcal{L}}}\cdot\mathcal{L}_{\mathcal{F}}

;

\bar{\nabla}\mathcal{F}_{1}^{md}\leftarrow\bar{\nabla}\mathcal{F}_{k-1}^{md}

; break

Algorithm 2 AGD

The AGD in Algorithm 2 executes until it converges

|\mathcal{F}_{k}-\mathcal{F}_{k-1}|<\mathcal{F}\text{tol},~{}\|\bm{\theta}_{k}-\bm{\theta}_{k-1}\|<\theta\text{tol}

(14)

or detects the stuck case according to Section IV-C1.

IV-C Stochastic Optimization with Momenta Adaptation

Some studies of deep learning [34, 24, 23, 35] propose stochastic gradient descent (SGD) for learning objects with noisy or sparse gradients. They optimize the objective function consisting of sub-functions in low coupling. Because of the higher robustness of Adam [23] during the function variation, we adopt its moment adaptation method and propose STOMA (Algorithm 3, Figure 2(b)) in the stuck case. This section will first illustrate how to generate a stochastic gradient (SG), detect the stuck case, and introduce how to accelerate SGD with momentum adaptation.

Input : initial

\bm{\theta}_{0}

\varrho

and

\mathcal{GP}(\bm{\mu},\bm{\mathcal{K}})

Output : optimized

\bm{\theta}^{sg}

2for reSgdIter: $j=1\dots N_{\textit{rsg}}$ do // restart SGD

3 Sample

\bm{\Theta}_{K}=\bm{\theta}_{0}\bigcup\{\bm{\theta}_{k}\sim\mathcal{GP}\left(\bm{\mu},\bm{\mathcal{K}}\right)\textbar_{k=1\dots K}\}

;

4 Evaluate costs

\mathfrak{F}_{K}=\{\mathcal{F}[\boldsymbol{\theta}_{k}]\big{|}_{k=0,1,\dots,K}\}

;

5 Initialize

k=0

\{\bm{\theta}_{0},\bm{\theta}_{0}^{ag}\}\leftarrow\operatorname*{arg\!\min}_{\bm{\theta}\in\bm{\Theta}_{K}}\mathfrak{F}_{K}

;

6 while True do // adaptive SGD (SgdIter)

k\leftarrow k+1

; Update

N_{\textit{sg}}\backsim\mathcal{U}(N_{lo}^{\textit{sg}},N_{up}^{\textit{sg}})

;

\bm{\theta}_{k}^{sg}=\left(1-\alpha_{k}\right)\bm{\theta}_{k-1}^{ag}+\alpha_{k}\bm{\theta}_{k-1}

;

9 Get

\bar{\nabla}\mathcal{F}^{sg}_{k}

by (15); check the Stuck-case;

10 if not isStuck then return.

11 if $k\geq N_{\textit{sg}}$ or $\|\bm{\theta}_{k}-\bm{\theta}_{k-1}\|\leq\textit{SGtol}$ then

12 Update

\bm{\theta}_{0}=\bm{\theta}_{k}^{sg}

; break

13 Estimate the

2^{nd}

moment

{\bm{\mathfrak{M}}}_{k}

via (18);

14 Update

\alpha_{k}

\bm{\mathfrak{B}}_{k}

and

\lambda_{k}

via (19);

\bm{\theta}_{k}=\bm{\theta}_{k-1}-\lambda_{k}\bar{\nabla}\mathcal{F}^{sg}_{k}

;

\bm{\theta}_{k}^{ag}=\bm{\theta}_{k}^{sg}-\bm{\mathfrak{B}}_{k}\bar{\nabla}\mathcal{F}^{sg}_{k}

;

Algorithm 3 STOMA

IV-C1 SG generation and Stuck case check

Since (6) reveals that the whole objective functional uses the penalty factor $\varrho$ to gather the GP prior and obstacle information which accumulates the sub-functions calculated by collision-check balls (CCBs) in discrete time, we generate an SG

\bar{\nabla}\mathcal{F}^{sg}=\hat{\varrho}\bar{\nabla}\mathcal{F}_{gp}+\bar{\nabla}{\mathcal{F}_{\textit{obs}}^{sg}}

(15)

in the Functional, Time, and Space scales:

(i) Functional: We select the penalty factor $\frac{1}{\hat{\varrho}}\backsim\mathcal{U}(0,\frac{1}{\varrho})$ .

(ii) Time: We select a set $\{n_{t}^{\textit{ip}}\backsim\mathcal{U}(0,N^{\textit{ip}})\big{|}_{t\in\mathbf{T}}\}$ whose element is the number of intervals of two support waypoints $[\theta_{t}^{\top},\theta_{t+1}^{\top}]^{\top}$ to gain a sub-gradient for collision-check:

\bar{\nabla}{\mathcal{F}_{\textit{obs}}^{sg}}=\mathbf{M}^{\top}\cdot\mathbf{g}_{up},

(16)

where (5) generates the upsampling matrix $\mathbf{M}$ with the set $\{n_{t}^{\textit{ip}}\big{|}_{t\in\mathbf{T}}\}$ to map the randomly upsampled gradient

\small\mathbf{g}_{up}=\left[\begin{array}[]{cccccc}\nabla^{\top}\mathcal{F}_{\textit{obs}}^{\bm{\mathcal{B}}}\left[{\theta}({t_{1}})\right],\\ \nabla^{\top}\mathcal{F}_{\textit{obs}}^{\bm{\mathcal{B}}}\left[{\theta}(t_{1}+\frac{1}{n_{t_{1}}^{\textit{ip}}+1})\right],\dots,\nabla^{\top}\mathcal{F}_{\textit{obs}}^{\bm{\mathcal{B}}}\left[{\theta}({t_{1}}+1)\right],\\ \cdots\\ \nabla^{\top}\mathcal{F}_{\textit{obs}}^{\bm{\mathcal{B}}}\left[{\theta}({t_{2}-\frac{n_{t_{2}\text{-}1}^{\textit{ip}}}{n_{t_{2}\text{-}1}^{\textit{ip}}+1}})\right],\dots,\nabla^{\top}\mathcal{F}_{\textit{obs}}^{\bm{\mathcal{B}}}\left[{\theta}(t_{2})\right]\end{array}\right]^{\top}

into the gradient $\bar{\nabla}\mathcal{F}_{\textit{obs}}^{sg}$ of $\bm{\theta}_{\mathbf{T}}$ with $\mathbf{T}=\{t_{1},\dots,t_{2}\}$ .

(iii) Space: The intuition of the Space rule comes from the collision avoidance of a human arm on a bookshelf in Figure 2(a). When a participant stretches for an object, he makes sequential reflexes, stimulated by the exterior forces (orange arrow) affecting the danger parts (orange/red balls), in response to the body-obstacle collision. It indicates that the collision risk rises from shoulder to hand because the number of orange/red balls rises from shoulder to hand. So we first reconstruct each rigid body of a tandem manipulator by the geometrically connected CCBs. Next, we build each sub-problem (i.e., the collision avoidance of a single CCB) from shoulder to end-effector, considering the risk variation. Then ${\nabla}\mathcal{F}^{\bm{\mathcal{B}}}_{\textit{obs}}$ is calculated by accumulating the CCB-gradients from $\mathcal{B}_{1}$ nearby shoulder up to $\mathcal{B}_{k}$ nearby end-effector:

\begin{array}[]{c}\nabla\mathcal{F}_{obs}^{\bm{\mathcal{B}}}\left({\theta}_{t}\right)=\nabla\mathcal{F}_{obs}^{|\bm{\mathcal{B}}|}\left({\theta}_{t}\right),\\[8.0pt] \nabla\mathcal{F}_{obs}^{k}=\sum_{\mathcal{B}_{i}\in\bm{\mathcal{B}}_{k}\atop\phi_{i}\leq\phi_{\text{tol}}}\nabla_{\bm{q}_{i}}\mathcal{F}_{obs},~{}\bm{\mathcal{B}}_{k}=\{\mathcal{B}_{1},\dots,\mathcal{B}_{k}\},\end{array}

(17)

where $\bm{q}_{i}$ contains the joints actuating CCB- $\mathcal{B}_{i}$ , and $\phi_{i}$ is the included angle between $\mathcal{B}_{i}$ ’s gradient ${\nabla}_{\bm{q}_{i}}\mathcal{F}_{obs}$ and the accumulated gradient ${\nabla}\mathcal{F}_{obs}^{i-1}$ (from $\mathcal{B}_{1}$ to $\mathcal{B}_{i-1}$ ). The random tolerance angle $\phi_{\text{tol}}$ generates SG in the Space scale by the random rejection of $\nabla_{\bm{q}_{i}}\mathcal{F}_{obs}$ with $\phi_{i}>\phi_{\text{tol}}$ , meaning that the space stochasticity will vanish when $\phi_{\text{tol}}\rightarrow 180\degree$ .

(iv) Stuck case: Though, unlike the robot-arm, dragged out by the SG (Figure 2(b)), the stuck case cannot occur during the human-arm avoidance (Figure 2(a)), we can still imagine how confused a human-arm reflex is when a pair of opposite forces stimulate it. So we select a constant $\phi_{\text{tol}}$ to detect the stuck case confusing the collision-free planning. Figure 3 visualizes how an arm is stuck in an obstacle when $\exists\phi_{i}>\phi_{\text{tol}}$ and how a sub-trajectory containing a series of arms is stuck. We do not adopt the pair-wise check because the risk variates among different parts, and it is more efficient for the stuck check by a constant $\phi_{\text{tol}}$ to synchronize with the SG generation by a random $\phi_{\text{tol}}$ .

IV-C2 Update rules

Adam [23] adopts the piecewise production of gradients and uses EMA from the initial step up to the current step and the bias correction method for the adaptive momentum estimation. Our work uses the squared $\ell_{2}$ -norm of SG to estimate $\hat{\bm{\mathfrak{M}}}_{1}=\|\bar{\nabla}\mathcal{F}^{sg}_{1}\|^{2}$ and adopt EMA and bias correction for 2nd-momenta $\{\bm{\mathfrak{M}}_{k}\}$ estimation:

\hat{\bm{\mathfrak{M}}}_{k}=\gamma\hat{\bm{\mathfrak{M}}}_{k-1}+(1-\gamma)\bar{\nabla}{\mathcal{F}^{sg}_{k}}^{\odot 2},~{}\bm{\mathfrak{M}}_{k}=\frac{\hat{\bm{\mathfrak{M}}}_{k}}{1-\gamma^{k}}.

(18)

As shown in Algorithm 3, we adopt the AGD rules [19] rather than the EMA with bias correction for first-momentum adaption. $\{\alpha_{k},\bm{\mathfrak{B}}_{k},\lambda_{k}\}$ ’s update rules are also reset as

\alpha_{k}=\frac{2}{k+1},~{}\bm{\mathfrak{B}}_{k}=\frac{\bm{\mathfrak{M}}_{k}^{-\frac{1}{2}}}{2\delta},~{}\frac{\lambda_{k}}{|\bm{\mathfrak{B}}_{k}|}\in\left[1,1+\frac{\alpha_{k}}{4}\right].

(19)

In this way, the descent process, approximately bounded by a $\delta$ -sized trust region, accelerates via the $\alpha$ -linear interpolation between the conservative step and the accelerated step driven by $\{\lambda_{k}\}$ and $\{\bm{\mathfrak{B}}_{k}\}$ correspondingly. Then the process varies from the lag state guided by $\bm{\theta}$ to the shifting one guided by $\bm{\theta}^{ag}$ with $\{\alpha_{k}\}$ value. Since the SGD is roughly Lipschitz continuous, we get

	$\displaystyle\mathcal{F}_{k-1}\leq$	$\displaystyle\mathcal{F}_{k}+\left\langle\bar{\nabla}\mathcal{F}_{k-1},\bm{\theta}_{k}-\bm{\theta}_{k-1}\right\rangle$
	$\displaystyle\leq$	$\displaystyle\mathcal{F}_{k}+\lambda_{k}\\|\bar{\nabla}\mathcal{F}_{k}^{sg}\\|^{2}+\lambda_{k}\left\\|\Delta_{k}\right\\|\left\\|\bar{\nabla}\mathcal{F}_{k}^{sg}\right\\|,$

where $\mathcal{F}_{k}$ and $\mathcal{F}_{k}^{sg}$ simplify $\mathcal{F}(\bm{\theta}_{k})$ and $\mathcal{F}(\bm{\theta}_{k}^{sg})$ , and

		$\displaystyle\left\\|\Delta_{k}\right\\|=\\|\bar{\nabla}\mathcal{F}_{k-1}-\bar{\nabla}\mathcal{F}_{k}^{sg}\\|\leq\mathcal{L}_{\mathcal{F}}^{sg}(1-\alpha_{k})\\|\bm{\theta}_{k-1}^{ag}-\bm{\theta}_{k-1}\\|,$
		$\displaystyle\bm{\theta}_{k}^{ag}-\bm{\theta}_{k}=\left(1-\alpha_{k}\right)\left(\bm{\theta}_{k-1}^{ag}-\bm{\theta}_{k-1}\right)+\left(\lambda_{k}-\bm{\mathfrak{B}}_{k}\right)\bar{\nabla}\mathcal{F}_{k}^{sg}$

according to lines 3, 3, 3 of Algorithm 3. Combining the Cauchy-Schwarz inequation, we get

	$\displaystyle\mathcal{F}_{k-1}\leq$	$\displaystyle\mathcal{F}_{k}+\lambda_{k}\left(1+\tfrac{\mathcal{L}_{\mathcal{F}}^{sg}\lambda_{k}}{2}\right)\left\\|\bar{\nabla}\mathcal{F}_{k}^{sg}\right\\|^{2}$
		$\displaystyle+\tfrac{\mathcal{L}_{\mathcal{F}}^{sg}\left(1-\alpha_{k}\right)^{2}}{2}{\left\\|\bm{\theta}_{k-1}^{ag}-\bm{\theta}_{k-1}\right\\|^{2}}.$

Then through the accumulation from $k=1$ to $N$ , we get

\begin{array}[]{l}\frac{\mathcal{F}_{0}-\mathcal{F}_{N}}{\mathcal{L}_{\mathcal{F}}^{sg}}\leq\sum_{k=1}^{N}\left(\lambda_{k}+\frac{\lambda_{k}^{2}}{2}+\frac{\left(\lambda_{k}-\beta_{k}\right)^{2}}{2\Gamma_{k}\alpha_{k}}\sum_{l=k}^{N}\Gamma_{l}\right)\left\|\bar{\nabla}\mathcal{F}_{k}^{sg}\right\|^{2}\end{array}

where $\beta_{k}=|\bm{\mathfrak{B}}_{k}|_{\infty^{-}}$ and $\Gamma_{l}=\left(1-\alpha_{l}\right)\Gamma_{l-1}$ with $\Gamma_{0}=1$ .

In sight of the stochasticity of $\mathcal{F}(\bm{\theta}^{sg})$ and its gradient $\bar{\nabla}\mathcal{F}^{sg}$ and presuming $\|\bar{\nabla}\mathcal{F}_{k}\|c_{l}\cdot\delta\geq\mathcal{L}_{\mathcal{F}}^{sg}$ , we get

\frac{\mathcal{F}_{0}-\mathcal{F}_{N}}{N}\leq\frac{5\mathbf{G}_{\infty}}{4\delta}\left(1+\frac{11c_{l}}{32}\right)\frac{\ln(N+1)}{N},

(20)

where $c_{l}$ is the fraction of $\mathcal{L}^{sg}_{\mathcal{F}}$ and trust region box $\delta\cdot\|\bar{\nabla}\mathcal{F}_{k}\|$ , and $\mathbf{G}_{\infty}\leq\mathds{E}(\|[\bar{\nabla}^{\top}\mathcal{F}_{1}^{sg},\dots,\bar{\nabla}^{\top}\mathcal{F}_{N}^{sg}]\|_{\infty})$ .

Figure 2(b) illustrates how a robot reflexes to the stuck case like the human in Figure 2(a), following the update rules in SgdIter. To improve the efficiency and reliability, we nest the above SGD (i.e., SgdIter) in reSgdIter. Once the SGD is restarted, we select the trajectory with the lowest cost from the sample set $\bm{\Theta}_{K}$ to reinitialize the next SGD, whose maximum number $N_{\textit{sg}}$ is selected randomly in $[N_{lo}^{\textit{sg}},N_{up}^{\textit{sg}}]$ .

V EXPERIMENT

This section will first introduce the objective functional implementation in Section V-A. Then it will detail the benchmark and parameter setting in Sections V-B1 & V-B2 and analyze the benchmark results in Section V-B3.

V-A Implementation details

V-A1 GP prior

We assume the velocity is constant and gain the GMM and GP-prior (1) to reduce the dimension. Our benchmark uses $N=12$ support states with $N^{\textit{ip}}=8$ intervals (116 states in total) for iSAGO and GPMP2, 116 states for STOMP and CHOMP, 12 and 58 states for TrajOpt⁴⁴4Since TrajOpt proposes a swept-out area between support waypoints for continuous-time safety, we choose TrajOpt-12 for safety validation and TrajOpt-58 to ensure the dimensional consistency of the benchmark. , and MaxConnectionDistance $=\frac{\|\theta_{g}-\theta_{0}\|}{12}$ for RRT-Connect. All implementations except RRT-Connect apply linear interpolation rather than manual selection to initialize trajectory for a fair comparison.

V-A2 Collision cost function

We adopt the polynomial piecewise form of [1] for the collision cost $c(\bm{x})$ of (2).

V-A3 Motion constraints

The motion cost function in GPMP [2] is adopted to drive the robot under the preplanned optimal collision-free trajectory in the real world.

V-B Evaluation

V-B1 Setup for benchmark

This paper benchmarks iSAGO against the numerical planners (CHOMP [1], TrajOpt [3], GPMP2 [2]) and the sampling planners (STOMP [8], RRT-Connect [7]) on a 7-DoF robot (LBR-iiwa) and a 6-DoF robot (AUBO-i5). Since the benchmark executes in MATLAB, we use BatchTrajOptimize3DArm of GPMP2-toolbox to implement GPMP2, plannarBiRRT with 20s maximum time for RRT-Connect, fmincon⁵⁵5We use optimoptions(’fmincon’,’Algorithm’,’trust-region-reflective’,’SpecifyObjectiveGradient’, true) to apply fmincon for the trust-region method like TrajOpt and calculates ObjectiveGradient analytically and Aeq (a matrix whose rows are the constraint gradients) by numerical differentiation. for TrajOpt, hmcSampler⁶⁶6Our benchmark defines an hmcSampler object whose logarithm probabilistic density function logpdf is defined by (6), uses hmcSampler.drawSamples for HMC adopted by CHOMP. for CHOMP, and mvnrnd⁷⁷7STOMP samples the noise trajectories by mvnrnd and updates the trajectory via projected weighted averaging. for STOMP. Since all of them are highly tuned in their own studies, our benchmark uses their default settings.

To illustrate the competence of iSAGO for planning tasks, we conduct 25 experiments on LBR-iiwa at a bookshelf and 12 experiments on AUBO-i5 at a storage shelf. We categorize all tasks into 3 classes (A, B, and C) whose planning difficulty rises with the stuck cases in the initial trajectory increase. Figure 4 visually validates our classification because the number of red in-collision CCBs increases from Task A-1 to Task C-3. Considering the difficulty of different classes, we first generate 2 tasks of class A, 3 tasks of class B, and 4 tasks of class C for LBR-iiwa (Figures 4(a)-4(c)). Then we generate 6 tasks of class C for AUBO-i5 (Figures 4(d)). Each task consists of 2 to 4 problems⁸⁸8Since LBR-iiwa has 7 DoFs, meaning infinite solutions for the same goal constraint, our benchmark uses LM [14] with random initial points to generate different goal states. So one planning task of LBR-iiwa with the same goal constraint has several problems with different goal states. Meanwhile, each task of AUBO-i5 has two problems with the same initial state and goal constraint because the same goal constraint has only one solution for the 6-DoF, and we switch yaw-angle in $\{0\degree,180\degree\}$ for each. with the same initial state and goal constraint. Moreover, we compare iSAGO with SAGO to show the efficiency of the incremental method and compare SAGO with STOMA or AGD to show the efficiency or reliability of the mixed optimization.

V-B2 Parameter setting

Since STOMA (Algorithm 3, Figure 2(b)) proceeds SGD roughly contained in a $\delta$ -sized trust-region with $\gamma$ -damped EMA of historical momenta, this section first tunes these two parameters, then the others.

According to abundant experiences, we tune the key parameters $\delta\in[0.04,0.80]$ , $\gamma\in[0.50,0.99]$ and conduct 10 different class C tasks for each pair of parameters. Table I indicates that the SGD rate increases with $\delta$ -expansion. In contrast, the success rate decreases with the excessive $\delta$ (i.e., too high or too low). Moreover, the SGD rate increases with $\gamma$ -shrink while the success rate decreases with $\gamma$ -shrink. All in all, iSAGO performs more stable with a smaller step-size $\delta$ , hampering local minima’s overcoming. Meanwhile, it combines less historical momenta to perform faster, reducing the reliability. So we set $\delta=0.40$ and $\gamma=0.90$ .

TABLE I: Results for

\{\delta,\gamma\}

tuning of 10 tasks in class C

	0.80	0.40	0.08	0.04
0.50	80 \| 1.253	90 \| 1.898	60 \| 4.198	40 \| 6.309
0.90	90 \| 2.512	100 \| 2.463	80 \| 4.973	60 \| 8.219
0.99	90 \| 2.672	100 \| 3.089	80 \| 7.131	60 \| 12.57

1

(%) and (s) denote the success rate and average computation time.

Besides $\delta$ and $\gamma$ , we set the initial penalty $\varrho=1.25\text{e-2}$ , penalty factor $\kappa_{\varrho}=0.4$ , and filter factor $c_{\eta}=2.0$ in Algorithm 1. In Algorithm 2, we set the tolerances $\mathcal{F}\text{tol}=8\text{e-4}$ , $\theta\text{tol}=\text{1e-3}$ , $\textit{obs}\text{tol}=\text{1e-4}$ and scaling factor $\kappa_{\mathcal{L}}=6.67$ . In Algorithm 3, we set the number of samples $K=12$ , tolerance $\textit{SG}\text{tol}=6.4\text{e-3}$ , and the number of SgdIter $N_{\textit{sg}}\backsim\mathcal{U}(35,55)$ . Moreover, we select $\phi_{\text{tol}}=95\degree$ to check stuck case while $\phi_{\text{tol}}\backsim\mathcal{U}(60\degree,180\degree)$ to generate SG.

V-B3 Result analysis

TABLE II: Results of 15 tasks (44 planning problems) with 5 repeated tests

	Problem	Our incremental mixed optimization				numerical optimization				probabilistic sampling
	Problem	AGD	STOMA	SAGO	iSAGO	TrajOpt-12	TrajOpt-58	GPMP2-12	CHOMP	STOMP	RRT-Connect
Scr(%)	iiwa_AB	75	97.5	100	100	27.5	77.5	81.25	87.5	82.5	100
	iiwa_C	8.75	87.5	91.25	93.75	3.75	8.75	11.25	30	25	72.5
	aubo_C	5	71.67	83.33	88.33	3.33	6.67	11.67	26.67	25	73.33
Avt(s)	iiwa_AB	0.278	3.031	1.714	1.145	0.127	0.495	1.294	2.728	3.080	8.92
	iiwa_C	0.288	7.622	4.284	2.132	0.232	1.108	2.464	6.989	7.691	13.65
	aubo_C	0.343	8.584	4.812	2.413	0.245	1.119	2.610	7.588	8.640	16.02
Sdt(s)	iiwa_AB	0.032	0.823	0.362	0.287	0.034	0.076	0.845	0.988	2.540	3.613
	iiwa_C	0.045	1.735	1.048	0.319	0.055	0.101	1.312	1.290	5.233	4.562
	aubo_C	0.078	1.821	1.101	0.334	0.057	0.106	1.377	1.354	5.494	3.974

1

Scr(%), Avt(s) and Sdt(s) denote the success rate, average computation time and standard deviation of computation time, respectively.
2

iiwa_AB, iiwa_C and aubo_C denote 16 class A&B problems on LBR-iiwa, 16 class C problems on LBR-iiwa, and 12 class C problems on AUBO-i5, respectively.

Figure 5(a) shows how iSAGO drags a series of in-stuck arms of Figure 4(c) out of the bookshelf by STOMA to grasp a cup located in the middle layer of the bookshelf safely. Meanwhile, Figure 5(b) shows how AUBO-i5 avoids the collision of Figure 4(d) to transfer the green piece between different cells of the storage shelf safely.

Table II shows iSAGO gains the highest success rate compared to the others. The random SgdIter number $N_{\textit{sg}}$ and AGD help iSAGO gain the fourth solving rate after TrajOpt-12, TrajOpt-58, and GPMP2-12. Though TrajOpt-58 compensates for the continuous safety information leakage of TrajOpt-12 in iiwa_AB, it still cannot escape from the local minima contained by the trust region and gains the second lowest success rate just above TrajOpt-12 in iiwa_C/aubo_C. In contrast, iSAGO successfully descends into an optimum with adaptive stochastic momenta and an appropriate initial trust-region. Thanks to HMC, randomly gaining the Hamiltonian momenta, CHOMP approaches the optimum with the third-highest success rate, whose failures are informed by the deterministic rather than the stochastic gradients. RRT-Connect with a limited time has the highest and the second-highest success rate in iiwa_AB and iiwa_C/aubo_C, respectively. However, a higher rate needs a smaller connection distance which restricts the RRT growth and computation efficiency. Though STOMP is free of gradient calculation, the significant time it takes to resample does a minor effect on feasible searching, mainly limited by the Gauss kernel. As for GPMP2-12 and TrajOpt-58, the LM and trust-region methods help approach the stationary point rapidly. In contrast, the point of iiwa_C/aubo_C has significantly lower feasibility than that in iiwa_A because the initial trajectory of iiwa_C/aubo_C gets stuck deeper.

The comparison between STOMA and SAGO in Table II shows how AGD performs a 45% accelerated descent towards an optimum. The comparison between iSAGO and SAGO indicates a 55% higher efficiency of incremental planning. The 90% lower success rate of AGD than SAGO shows the limitation of the numerical method. It validates that STOMA can modify the manifold with less local minima by randomly selecting sub-functional.

The comparison between iiwa_C and aubo_C in Tables II shows that iSAGO performs better when the feasible solution is nearby the motion constraints. That is because the motion constraint somewhat stabilizes the $\delta$ -sized SGD of STOMA. So the large $\delta$ -sized steps bring a severe disturbance and weaken STOMA’s performance by 20% when obstacles wrap the feasible solution. Meanwhile, the comparison between iiwa_AB and iiwa_C shows that the stuck case reduces the success rate of the numerical method by 90% and validates iSAGO’s higher reliability in a narrow workspace under the same non-manual initial condition.

VI CONCLUSIONS

iSAGO utilizes the mixed information of accelerated (AGD) and stochastic (STOMA) momentum to overcome the body-obstacle stuck cases in a narrow workspace.

(i) STOMA performs an adaptive stochastic descent to avoid the local minima confronted by the numerical methods, and the results show the highest success rate among them.

(ii) AGD integrated by iSAGO accelerates the descent informed by the first-order momenta. The results show it saves the sampling method’s computation resources and gain the fourth solving rate.

(iii) The incremental planning optimizes the sub-planning to elevate the whole-planning rate further. The results show the 55% higher efficiency of iSAGO than SAGO.

References

[1] M. Zucker, N. Ratliff, A. D. Dragan, M. Pivtoraiko, M. Klingensmith, C. M. Dellin, J. A. Bagnell, and S. S. Srinivasa, “Chomp: Covariant hamiltonian optimization for motion planning,” The International Journal of Robotics Research, vol. 32, no. 9-10, pp. 1164–1193, 2013.
[2] M. Mukadam, J. Dong, X. Yan, F. Dellaert, and B. Boots, “Continuous-time gaussian process motion planning via probabilistic inference,” The International Journal of Robotics Research, vol. 37, no. 11, pp. 1319–1340, 2018.
[3] J. Schulman, J. Ho, A. X. Lee, I. Awwal, H. Bradlow, and P. Abbeel, “Finding locally optimal, collision-free trajectories with sequential convex optimization.” in Robotics: science and systems, vol. 9, no. 1. Citeseer, 2013, pp. 1–10.
[4] M. Bhardwaj, B. Boots, and M. Mukadam, “Differentiable gaussian process motion planning,” in 2020 IEEE International Conference on Robotics and Automation (ICRA), May 2020, pp. 10 598–10 604.
[5] L. E. Kavraki, M. N. Kolountzakis, and J. . Latombe, “Analysis of probabilistic roadmaps for path planning,” IEEE Transactions on Robotics and Automation, vol. 14, no. 1, pp. 166–171, Feb 1998.
[6] S. M. LaValle et al., “Rapidly-exploring random trees: A new tool for path planning,” 1998.
[7] J. J. Kuffner and S. M. LaValle, “Rrt-connect: An efficient approach to single-query path planning,” in Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065), vol. 2, April 2000, pp. 995–1001 vol.2.
[8] M. Kalakrishnan, S. Chitta, E. Theodorou, P. Pastor, and S. Schaal, “Stomp: Stochastic trajectory optimization for motion planning,” in 2011 IEEE International Conference on Robotics and Automation, May 2011, pp. 4569–4574.
[9] S. Karaman and E. Frazzoli, “Sampling-based algorithms for optimal motion planning,” The international journal of robotics research, vol. 30, no. 7, pp. 846–894, 2011.
[10] P. Rajendran, S. Thakar, A. M. Kabir, B. C. Shah, and S. K. Gupta, “Context-dependent search for generating paths for redundant manipulators in cluttered environments,” in 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019, pp. 5573–5579.
[11] C. Park, J. Pan, and D. Manocha, “Itomp: Incremental trajectory optimization for real-time replanning in dynamic environments,” in Proceedings of the Twenty-Second International Conference on International Conference on Automated Planning and Scheduling, ser. ICAPS’12. AAAI Press, 2012, pp. 207–215.
[12] K. Shirley, “Inference from simulations and monitoring convergence,” Handbook of Markov Chain Monte Carlo, May 2011.
[13] M. Kaess, H. Johannsson, R. Roberts, V. Ila, J. J. Leonard, and F. Dellaert, “isam2: Incremental smoothing and mapping using the bayes tree,” The International Journal of Robotics Research, vol. 31, no. 2, pp. 216–235, 2012.
[14] K. Levenberg, “A method for the solution of certain non-linear problems in least squares,” Quarterly of Applied Mathematics, vol. 2, no. 2, pp. 164–168, 1944.
[15] R. H. Byrd, J. C. Gilbert, and J. Nocedal, “A trust region method based on interior point techniques for nonlinear programming,” Mathematical programming, vol. 89, no. 1, pp. 149–185, 2000.
[16] J. Ichnowski, Y. Avigal, V. Satish, and K. Goldberg, “Deep learning can accelerate grasp-optimized motion planning,” Science Robotics, vol. 5, no. 48, p. eabd7710, 2020.
[17] A. Kuntz, C. Bowen, and R. Alterovitz, “Fast anytime motion planning in point clouds by interleaving sampling and interior point optimization,” in Robotics Research, N. M. Amato, G. Hager, S. Thomas, and M. Torres-Torriti, Eds. Cham: Springer International Publishing, 2020, pp. 929–945.
[18] Y. Nesterov, “A method for unconstrained convex minimization problem with the rate of convergence o (1/k^ 2),” in Doklady an ussr, vol. 269, 1983, pp. 543–547.
[19] S. Ghadimi and G. Lan, “Accelerated gradient methods for nonconvex nonlinear and stochastic programming,” Mathematical Programming, vol. 156, no. 1-2, pp. 59–99, 2016.
[20] W. Su, S. Boyd, and E. Candes, “A differential equation for modeling nesterov’s accelerated gradient method: Theory and insights,” in Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2014, pp. 2510–2518.
[21] A. C. Wilson, B. Recht, and M. I. Jordan, “A lyapunov analysis of momentum methods in optimization,” 2018.
[22] L. Bottou, “Large-scale machine learning with stochastic gradient descent,” in Proceedings of COMPSTAT’2010, Y. Lechevallier and G. Saporta, Eds. Heidelberg: Physica-Verlag HD, 2010, pp. 177–186.
[23] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[24] H. Geoffrey, N. Srivastava, and K. Swersky, “Neural networks for machine learning lecture 6a overview of mini-batch gradient descent,” Department of Computer Science, University of Toronto, Tech. Rep. 14, 02 2012.
[25] X. Wu and M. W. Mueller, “In-flight range optimization of multicopters using multivariable extremum seeking with adaptive step size,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 1545–1550.
[26] K. Solovey, O. Salzman, and D. Halperin, “New perspective on sampling-based motion planning via random geometric graphs,” The International Journal of Robotics Research, vol. 37, no. 10, pp. 1117–1133, 2018.
[27] E. W. Dijkstra, “A note on two problems in connexion with graphs,” Numerische Mathematik, vol. 1, no. 1, pp. 269–271, 1959.
[28] A. G. Hofmann, E. Fernandez, J. Helbert, S. D. Smith, and B. C. Williams, “Reactive integrated motion planning and execution,” in Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015.
[29] L. Petrović, J. Peršić, M. Seder, and I. Marković, “Stochastic optimization for trajectory planning with heteroscedastic gaussian processes,” in 2019 European Conference on Mobile Robots (ECMR), 2019, pp. 1–6.
[30] T. Osa, “Multimodal trajectory optimization for motion planning,” The International Journal of Robotics Research, vol. 39, no. 8, pp. 983–1001, 2020.
[31] J. Nocedal and S. Wright, Numerical optimization. Springer Science & Business Media, 2006.
[32] Y. Nesterov, Introductory lectures on convex optimization: A basic course. Springer Science & Business Media, 2013, vol. 87.
[33] B. O’Donoghue and E. Candès, “Adaptive restart for accelerated gradient schemes,” Foundations of Computational Mathematics, vol. 15, no. 3, pp. 715–732, 2015.
[34] J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning and stochastic optimization,” J. Mach. Learn. Res., vol. 12, pp. 2121–2159, July 2011.
[35] I. Goodfellow, Y. Bengio, and A. Courville, Deep learning. MIT press, 2016.

	0.80	0.40	0.08	0.04
0.50	80 \| 1.253	90 \| 1.898	60 \| 4.198	40 \| 6.309
0.90	90 \| 2.512	100 \| 2.463	80 \| 4.973	60 \| 8.219
0.99	90 \| 2.672	100 \| 3.089	80 \| 7.131	60 \| 12.57