This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Incrementally Stochastic and Accelerated Gradient Information mixed Optimization for Manipulator Motion Planning

Yichang Feng, Jin Wang, Haiyun Zhang, Guodong Lu Manuscript received March 24, 2022; Accepted June 26, 2022. This paper was recommended for publication by Editor Kurniawati, Hanna upon evaluation of the Associate Editor and Reviewers’ comments. This work was supported by the Key R&D Program of Zhejiang Province (2020C01025, 2020C01026), the National Natural Science Foundation of China (52175032), and Robotics Institute of Zhejiang University Grant (K11808). The authors are with 1) State Key Laboratory of Fluid Power and Mechatronic Systems, School of Mechanical Engineering, Zhejiang University, Hangzhou 310027, China; 2) Engineering Research Center for Design Engineering and Digital Twin of Zhejiang Province, School of Mechanical Engineering, Zhejiang University, Hangzhou 310027, China. Jin Wang, corresponding author, dwjcom@zju.edu.cnDigital Object Identifier (DOI): see top of this page.
Abstract

This paper introduces a novel motion planner, incrementally stochastic and accelerated gradient information mixed optimization (iSAGO), for robotic manipulators in a narrow workspace. Primarily, we propose the overall scheme of iSAGO informed by the mixed momenta for an efficient constrained optimization based on the penalty method. In the stochastic part, we generate the adaptive stochastic momenta via the random selection of sub-functionals based on the adaptive momentum (Adam) method to solve the body-obstacle stuck case. Due to the slow convergence of the stochastic part, we integrate the accelerated gradient descent (AGD) to improve the planning efficiency. Moreover, we adopt the Bayesian tree inference (BTI) to transform the whole trajectory optimization (SAGO) into an incremental sub-trajectory optimization (iSAGO), which improves the computation efficiency and success rate further. Finally, we tune the key parameters and benchmark iSAGO against the other 5 planners on LBR-iiwa on a bookshelf and AUBO-i5 on a storage shelf. The result shows the highest success rate and moderate solving efficiency of iSAGO.

Index Terms:
Constrained Motion Planning, Collision Avoidance, Manipulation Planning

I INTRODUCTION

The industrial manufactory has widely applied robots in various areas such as welding and product loading or placing. Most of them always execute the above tasks under manual teaching or programming. So automatic production urges an efficient and robust optimal motion planning (OMP) algorithm to elevate industrial intelligence.

Though former OMP studies gain a collision-free optimal trajectory for a safe and smooth motion by numerical optimization [1, 2, 3, 4] or probabilistic sampling [5, 6, 7, 8, 9, 10], there still exist two main concerns this paper aims to solve:

(i) Reliability: The numerical methods, such as CHOMP[1], GPMP[2], and TrajOpt[3], can rapidly converge to a minimum with descent steps informed by the deterministic momenta/gradients. However, the local minima (i.e., failure plannings) are unavoidable with an inappropriate initial point. That is because the momenta information only describes the manifold of a local space near the initial point. So it is difficult for them to plan a safe motion with a high success rate (i.e., high reliability) in a narrow space.

(ii) Efficiency: The sampling method like STOMP[8] directly samples the trajectories to gain the optima. Others like RRT-Connect [7] grow a searching tree by the randomly sampled waypoints. Their sampling and wiring process can generate safe trajectories free of manifold information. However, their efficiency highly depends on the proportion the feasible subspace takes of the overall searching space. So the indiscriminate process takes enormous computation resources (i.e., low efficiency) in a narrow space.

Refer to caption
Figure 1: A block diagram illustrates iSAGO (Algorithm 1, Section IV-A) that plans a collision-free smooth motion, whose outer layer shows how to refine a whole trajectory via BTI-based sub-trajectories optimization. The PenIter integrates AGD (Algorithm 2, Section IV-B) and STOMA (Algorithm 3, Section IV-C) to solve the non-convex OMP by detecting the stuck case.

This paper proposes iSAGO (Figure 1), which integrates the Stochastic and Accelerated information of Gradients and builds a Bayes tree for an incremental Optimization:

(i) To address reliability, Stochastic trajectory optimization with moment adaptation (STOMA, Section IV-C) overcomes the local minima from the body-obstacle stuck cases by randomly selecting variables, such as collision-check balls, time intervals, and penalty factors. The information leakage of the stochastic momenta can somewhat modify OMP’s manifold with fewer local minima.

(ii) Considering the low efficiency of STOMA with 𝒪(logN/N)\mathcal{O}(\log N/N) convergence rate, iSAGO integrates accelerated gradient descent (AGD, Section IV-B) in the non-stuck case because its 𝒪(1/N32)\mathcal{O}(1/N^{\frac{3}{2}}) convergence rate of gradient norm is proven optimal in the first-order convex optimization. Furthermore, iSAGO adopts Bayes tree inference to optimize the trajectory incrementally (Section IV-A) for the further efficiency elevation, which optimizes the convex or non-convex sub-trajectories separately in iSAGO.reTrajOpt.

The experiment of the 15 planning tasks with 44 problems on LBR-iiwa and AUBO-i5 (Section V-B) validates higher reliability of iSAGO than the numerical methods [1, 2, 3] and its higher efficiency than the sampling methods [8, 7].

II RELATED WORKS

II-A Numerical Optimization

The main concern of numerical optimization is rapidly descending to an optimum. CHOMP [1], ITOMP [11], and GPMP [2] adopt the gradient descent method with the fixed step size. CHOMP uses Hamiltonian Monte Carlo (HMC) [12] for success rate improvement. To lower the computational cost, GPMP and dGPMP [4] adopt iSAM2 [13] to do incremental planning, and each sub-planning converges with a super-linear rate via Levenberg–Marquardt (LM) [14] algorithm. Meanwhile, ITOMP interleaves planning with task execution in a short-time period to adapt to the dynamic environment. Moreover, TrajOpt [3] uses the trust-region [15] method to improve efficiency. It is also adopted by GOMP [16] for grasp-optimized motion planning with multiple warm restarts learned from a deep neural network. Instead of deep learning, ISIMP [17] interleaves sampling and interior-point optimization for planning. Nevertheless, the above methods may converge to local minima when the objective function is not strongly convex.

AGD [18, 19] has recently been developed and implemented on the optimal control [20, 21] of a dynamic system described by differential equations whenever the objective is strongly convex or not. Moreover, stochastic optimization employs the momentum theorems of AGD and solves the large-scale semi-convex cases [22], which generates stochastic sub-gradient via the random sub-datasets selection. Adam [23] upgrades RMSProp [24] and introduces an exponential moving average (EMA) for momentum adaptation. Furthermore, [25] adopts Adam to overcome the disturbance of a UAV system.

II-B Probabilistic Sampling

Unlike the numerical method, the sampling method constructs a search graph to query feasible solutions or iteratively samples trajectories for motion planning.

PRM [5] and its asymptotically-optimal variants like PRM* [9] and RGGs [26] make a collision-free connection among the feasible vertexes to construct a roadmap. Then they construct an optimal trajectory via shortest path (SP) algorithms like Dijkstra [27] and Chehov [28], which store and query partial trajectories efficiently. Unlike PRM associated with SP, RRT [6] and its asymptotically-optimal variants like RRT* [9] and CODEs [10] find a feasible solution by growing rapidly-exploring random trees (RRTs).

STOMP [8] resamples trajectory obeying Gaussian distribution renewed by the important samples. [29] introduces the GPMP’s function to improve the searching efficiency of STOMP. Moreover, SMTO [30] applies Monte-Carlo optimization for multi-solution and refines them numerically.

III PROBLEM FORMULATION

III-A Objective functional

We adopt the probabilistic inference model of GPMP [2] to infer a collision-free optimal trajectory given an environment with obstacles, a manipulator’s arm, start, and goal.

III-A1 GP prior

Gaussian process (GP) describes a multi-variant Gaussian distribution of trajectory 𝜽𝒢𝒫(𝝁,𝓚)\bm{\theta}\backsim\mathcal{GP}(\bm{\mu},\bm{\mathcal{K}}). 𝝁𝐓=[μi]|i𝐓\bm{\mu}_{\mathbf{T}}=[\mu_{i}^{\top}]^{\top}\big{|}_{i\in\mathbf{T}} and 𝓚𝐓=[𝒦i,j]|i,j𝐓\bm{\mathcal{K}}_{\mathbf{T}}=[\mathcal{K}_{i,j}]\big{|}_{i,j\in\mathbf{T}} determine its expectation value and covariance matrix of trajectory 𝜽𝐓=[θi]|i𝐓\bm{\theta}_{\mathbf{T}}=[\theta_{i}^{\top}]^{\top}\big{|}_{i\in\mathbf{T}}, respectively, where a set 𝐓={t1,,t2}\mathbf{T}=\{t_{1},\dots,t_{2}\} denotes a period of time from t1t_{1} to t2t_{2}. Since the above distribution originates from Gauss–Markov model (GMM)111Section 4 of [2] introduces GP-prior, GMM generated by a linear time-varying stochastic differential equation (LTV-SDE), and GP-interpolation utilized by Section 5 for up-sampling. , informed by the start and goal under zero acceleration assumption, we define the smooth functional

gp(𝜽𝐓)=12(𝜽𝐓𝝁𝐓)𝓚𝐓1(𝜽𝐓𝝁𝐓).\mathcal{F}_{gp}(\bm{\theta}_{\mathbf{T}})=\frac{1}{2}\left(\bm{\theta}_{\mathbf{T}}-\bm{\mu}_{\mathbf{T}}\right)^{\top}\bm{\mathcal{K}_{\mathbf{T}}}^{-1}\left(\bm{\theta}_{\mathbf{T}}-\bm{\mu}_{\mathbf{T}}\right). (1)

III-A2 Collision avoidance

Since the collision-free trajectory means a safe motion or successful planning, we first utilize 𝒙(i,t)\bm{x}(\mathcal{B}_{i},t) to map from a state θt\theta_{t} at time tt to the position state of a collision-check ball (CCB-i\mathcal{B}_{i}) on the manipulator. Then we calculate the collision cost c(𝒙)c(\bm{x}), which increases when the distance between i\mathcal{B}_{i} and its closest obstacle decreases, and define the obstacle functional

obs(𝜽𝐓)=t𝐓i𝓑c[𝒙(i,t)]𝒙˙(i,t),\mathcal{F}_{obs}(\bm{\theta}_{\mathbf{T}})=\sum_{t\in\mathbf{T}}\sum_{\mathcal{B}_{i}\in\bm{\mathcal{B}}}c\left[\bm{x}(\mathcal{B}_{i},t)\right]\cdot\left\|\dot{\bm{x}}(\mathcal{B}_{i},t)\right\|, (2)

where 𝓑={1,,|𝓑|}\bm{\mathcal{B}}=\{\mathcal{B}_{1},\dots,\mathcal{B}_{|\bm{\mathcal{B}}|}\} consists of |𝓑||\bm{\mathcal{B}}| CCBs.

(i) Stuck case and local minima: The functional (2) consists of discrete CCBs for calculating the cost and gradient. So there may exist the sub-gradients with opposite directions in a narrow workspace. In this case, the gradient of (2) will approach zero when a collision cost is still high. It is a so-called body-obstacle stuck case that results in the local minima. According to Section IV-C1 and the objective functional (6), all local minima are unsafe trajectories because the non-convexity originates from the obstacle part (2) caused by the stuck case rather than the convex GP part (1).

(ii) Continuous-time safety: Though the above form combines the collision cost c(𝒙)c(\bm{x}) with the change rate 𝒙˙\dot{\bm{x}} of CCB state for the continuous-time collision avoidance between two adjacent waypoints, it still cannot ensure safety when the obstacle allocation is sparse. This combination at one single time tt cannot precisely represent the obstacle functional within a continuous-time interval measured by an infinite number of timestamps, including tt. So the up-sampling [2] method is adopted for continuous-time safety. According to GMM, we get 𝚽(t+1,t)\bm{\Phi}(t+1,t) transforming θt\theta_{t} to θt+1\theta_{t+1} and

𝚲t,τ=𝚽(t+τntip+1,t)𝚿t,τ𝚽(t+1,t),\bm{\Lambda}_{t,\tau}=\bm{\Phi}(t+\tfrac{\tau}{n_{t}^{\textit{ip}}+1},t)-\bm{\Psi}_{t,\tau}\bm{\Phi}(t+1,t), (3)
𝚿t,τ=𝐐t,τ𝚽(t+1,t+τntip+1)𝐐t,11.\bm{\Psi}_{t,\tau}=\mathbf{Q}_{t,\tau}\bm{\Phi}(t+1,t+\tfrac{\tau}{n_{t}^{\textit{ip}}+1})^{\top}\mathbf{Q}_{t,1}^{-1}. (4)

where 𝐐t,τ\mathbf{Q}_{t,\tau} transforms the GP-kernel from tt to (t+τntip+1)(t+\tfrac{\tau}{n_{t}^{\textit{ip}}+1}) with τ{1,,ntip}\tau\in\{1,\dots,n_{t}^{\textit{ip}}\}, and ntipn_{t}^{\textit{ip}} is the number of intervals between two adjacent waypoints {θt,θt+1}\{\theta_{t},\theta_{t+1}\}. Then we utilize 𝚲t=[𝚲t,1𝚲t,ntip]\bm{\Lambda}_{t}=[\bm{\Lambda}_{t,1}^{\top}\ldots\bm{\Lambda}_{t,n_{t}^{\textit{ip}}}^{\top}]^{\top}, 𝚿t=[𝚿t,1𝚿t,ntip]\bm{\Psi}_{t}=[\bm{\Psi}_{t,1}^{\top}\ldots\bm{\Psi}_{t,n_{t}^{\textit{ip}}}^{\top}]^{\top} to form the upsampling matrix of trajectory 𝜽𝐓\bm{\theta}_{\mathbf{T}} in the period 𝐓\mathbf{T}:

𝐌=[𝐈𝟎𝟎𝟎𝟎𝚲t1𝚿t1𝟎𝟎𝟎𝟎𝟎𝐈𝟎𝟎𝟎𝟎𝟎𝚲t𝚿t𝟎𝟎𝟎𝟎𝟎𝐈𝟎𝟎𝟎𝟎𝚲t2-1𝚿t2-1𝟎𝟎𝟎𝐈]{\mathbf{M}=\left[\begin{array}[]{cccccccc}\mathbf{I}&\mathbf{0}&\mathbf{0}&\dots&\dots&\dots&\mathbf{0}&\mathbf{0}\\ \bm{\Lambda}_{t_{1}}&\bm{\Psi}_{t_{1}}&\mathbf{0}&\dots&\dots&\dots&\mathbf{0}&\mathbf{0}\\ \vdots&\vdots&\ddots&&&&\vdots&\vdots\\ \mathbf{0}&\mathbf{0}&\dots&\mathbf{I}&\mathbf{0}&\dots&\mathbf{0}&\mathbf{0}\\ \mathbf{0}&\mathbf{0}&\dots&\bm{\Lambda}_{t}&\bm{\Psi}_{t}&\dots&\mathbf{0}&\mathbf{0}\\ \mathbf{0}&\mathbf{0}&\dots&\mathbf{0}&\mathbf{I}&\dots&\mathbf{0}&\mathbf{0}\\ \vdots&\vdots&&&&\ddots&\vdots&\vdots\\ \mathbf{0}&\mathbf{0}&\dots&\dots&\dots&\dots&\bm{\Lambda}_{t_{2}\text{-}1}&\bm{\Psi}_{t_{2}\text{-}1}\\ \mathbf{0}&\mathbf{0}&\dots&\dots&\dots&\dots&\mathbf{0}&\mathbf{I}\\ \end{array}\right]} (5)

for an upsampled trajectory 𝜽𝐓up=𝐌(𝜽𝐓𝝁𝐓)+𝝁𝐓up\bm{\theta}^{\textit{up}}_{\mathbf{T}}=\mathbf{M}(\bm{\theta}_{\mathbf{T}}-\bm{\mu}_{\mathbf{T}})+\bm{\mu}^{\textit{up}}_{\mathbf{T}}. Section IV-C1 will detail the generation of a stochastic gradient in the time scale by the random selection of {ntip|t𝐓}\{n^{\textit{ip}}_{t}|_{t\in\mathbf{T}}\}.

III-A3 Objective functional

This paper adopts the objective functional of [1, 2] to generate an optimal trajectory 𝜽𝐓\bm{\theta}_{\mathbf{T}} in high smoothness without collision:

(𝜽𝐓)=ϱgp(𝜽𝐓)+obs(𝜽𝐓up),\mathcal{F}(\bm{\theta}_{\mathbf{T}})=\varrho\mathcal{F}_{gp}(\bm{\theta}_{\mathbf{T}})+\mathcal{F}_{obs}(\bm{\theta}_{\mathbf{T}}^{\textit{up}}), (6)

where ϱ\varrho trades off between the GP prior (smoothness) and obstacle (collision-free) functional. Here we recommend [1, 2] for the detailed derivation of its gradient.

III-B Bayes tree construction

The above has introduced the relation between the stuck case and local minima, how to ensure the continuous-time safety, and how to transform the requirements of collision-free and smoothness to the Lagrangian formed objective functional of 𝜽𝐓\bm{\theta}_{\mathbf{T}}. According to the definition of 𝐓\mathbf{T}, we can group all timestamps {1,,N}\{1,\dots,N\} into it to optimize the whole trajectory with NN waypoints or group some of them {t1,,t2}\{t_{1},\dots,t_{2}\} for sub-trajectories. These sub-problems can be solved incrementally to elevate the efficiency of trajectory optimization, like iGPMP’s incremental replanning [2].

The Bayes tree (BT) definition in [2] tells us a single chain with conjugated waypoints constructs a BT with a set of nodes Θ={θt|t=1N}\Theta=\{\theta_{t}|_{t=1\dots N}\} and branches {[θt,θt+1]|t=0N}\{[\theta_{t},\theta_{t+1}]|_{t=0\dots N}\}. In this way, we utilize (6) to calculate a BT-factor

tbt=(𝜽𝐓) with 𝐓={t1,t,t+1},\mathcal{F}_{t}^{\text{bt}}=\mathcal{F}(\bm{\theta}_{\mathbf{T}})\text{ with }\mathbf{T}=\{t-1,t,t+1\}, (7)

informed by a minimum sub-trajectory. Section IV-A will detail how to utilize it for incremental optimization.

IV METHODOLOGY

IV-A Incremental optimization with mixed steps

Given the start θ0\theta_{0} and goal θN+1\theta_{N+1}, iSAGO (Algorithm 1, Figure 1) first interpolates the support waypoints between them via the linear interpolation to gain an initial trajectory. Then it finds a series of collision-free waypoints facing the two planning scenarios: (i) the convex case satisfying

𝒞θ={𝜽||(𝜽)(𝜽)¯(𝜽),𝜽𝜽|𝜽𝜽22};\mathcal{C}_{\theta}=\{\bm{\theta}^{{}^{\prime}}\bigr{|}\tfrac{|\mathcal{F}(\bm{\theta}^{\prime})-\mathcal{F}(\bm{\theta})-\langle\bar{\nabla}\mathcal{F}(\bm{\theta}),\bm{\theta}^{\prime}-\bm{\theta}\rangle|}{\|\bm{\theta}^{\prime}-\bm{\theta}\|^{2}}\leq\tfrac{\mathcal{L}_{\mathcal{F}}}{2}\}; (8)

(ii) the non-convex case where the trajectory gets stuck in obstacles, causing the local minima.

Given these two cases, iSAGO mixes the accelerated and stochastic gradient information. AGD (Section IV-B) solves case (i) with an optimal convergence informed by the first order accelerated gradient. On the other hand, STOMA (Section IV-C) utilizes the stochastic gradient to drag the trajectory from case (ii) into a convex sub-space, i.e., case (i). Moreover, iSAGO uses the penalty method [31] to nest the above methods in PenIter for constrained optimization.

Input : initial 𝜽\bm{\theta}, 𝒢𝒫\mathcal{GP}, BT, and tradeoff ϱ\varrho.
Output : optimized trajectory 𝜽\bm{\theta}.
1
2for reTrajIter:i=1Nuf\textit{reTrajIter}:i=1\dots N_{uf} do // incremental OMP
3       Get noisy set 𝚯𝔑=j=1|𝚯𝔑|𝜽𝔑j\bm{\Theta}_{\mathfrak{N}}=\bigcup_{j=1}^{|\bm{\Theta}_{\mathfrak{N}}|}\bm{\theta}_{\mathfrak{N}}^{j} via (10);
4       if 𝚯𝔑=\bm{\Theta}_{\mathfrak{N}}=\emptyset then  return.
5       for j=1|𝚯𝔑|j=1\dots|\bm{\Theta}_{\mathfrak{N}}| do // optimize 𝚯𝔑\bm{\Theta}_{\mathfrak{N}}
6             Get sub-trajectory 𝜽𝐓𝜽𝔑j\bm{\theta}_{\mathbf{T}}\leftarrow\bm{\theta}_{\mathfrak{N}}^{j} with 𝐓𝐓𝔑j\mathbf{T}\leftarrow\mathbf{T}^{j}_{\mathfrak{N}};
7             for PenIter:1Nϱ\textit{PenIter}:1\dots N_{\varrho} do // penalty method
8                   Check the body-obstacle stuck case according to Section IV-C1;
9                   if isStuck then  𝜽𝐓STOMA(𝜽𝐓,𝒢𝒫(𝝁𝐓,𝓚𝐓),ϱ)\bm{\theta}_{\mathbf{T}}\leftarrow\texttt{STOMA($\bm{\theta}_{\mathbf{T}}$,$\mathcal{GP(\bm{\mu}_{\mathbf{T}},\bm{\mathcal{K}}_{\mathbf{T}})}$,$\varrho$)};
10                  else  𝜽𝐓AGD(𝜽𝐓,𝒢𝒫(𝝁𝐓,𝓚𝐓),ϱ)\bm{\theta}_{\mathbf{T}}\leftarrow\texttt{AGD($\bm{\theta}_{\mathbf{T}}$,$\mathcal{GP(\bm{\mu}_{\mathbf{T}},\bm{\mathcal{K}}_{\mathbf{T}})}$,$\varrho$)};
11                   if obs(𝛉)<obstol\mathcal{F}_{obs}(\bm{\theta})<\textit{obs}\text{tol} then  break;
12                  else  ϱκϱϱ\varrho\leftarrow\kappa_{\varrho}\cdot\varrho;
13                  
14            Update the BT-factors {tbt|tT}\{\mathcal{F}^{\text{bt}}_{t}|_{t\in\textbf{T}}\};
15            
16      
Algorithm 1 iSAGO

Formally, our mixed method optimizes the whole trajectory to plan a safe motion. However, it is inefficient because the trajectory consists of collision-free and in-collision (no-stuck and in-stuck) parts, each requiring a different method. So we adopt iSAM2 [13] and build a BT (Section III-B) with factors (7). Since an optimal BT has no significant factor, incremental-SAGO (iSAGO) finds the significant factors based on the mean and standard deviation of all factors:

μ=t=1Nwttbt,𝒟=t=1Nwt(tbtμ)2,\displaystyle\mu_{\mathcal{F}}=\sum_{t=1\dots N}w_{t}\mathcal{F}_{t}^{\text{bt}},~{}\mathcal{D}_{\mathcal{F}}=\sqrt{\sum_{t=1\dots N}w_{t}\left(\mathcal{F}_{t}^{\text{bt}}-\mu_{\mathcal{F}}\right)^{2}}, (9)

where wt=1/Nw_{t}=1/N and NN is the number of waypoints. Next, iSAGO gains a set of waypoints with significant factors:

Θ𝔑={θt||tbtμ|>cη𝒟,t=1N},{\Theta}_{\mathfrak{N}}=\left\{\theta_{t}\bigr{|}~{}|\mathcal{F}_{t}^{\text{bt}}-\mu_{\mathcal{F}}|>c_{\eta}\mathcal{D}_{\mathcal{F}},t=1\dots N\right\}, (10)

where a smaller cηc_{\eta} filters out more factors. Then it divides the whole trajectory into slices, each consisting of the adjacent θΘ𝔑\theta\in\Theta_{\mathfrak{N}}. After inserting one θΘ𝔑\theta\notin\Theta_{\mathfrak{N}} at the head and tail of each slice, we get 𝚯𝔑=j=1|𝚯𝔑|𝜽𝔑j\bm{\Theta}_{\mathfrak{N}}=\bigcup_{j=1}^{|\bm{\Theta}_{\mathfrak{N}}|}\bm{\theta}_{\mathfrak{N}}^{j} and stamp each sub-trajectory 𝜽𝔑j\bm{\theta}_{\mathfrak{N}}^{j} by 𝐓𝔑j={t|θt𝜽𝔑j}\mathbf{T}^{j}_{\mathfrak{N}}=\{t\big{|}\theta_{t}\in\bm{\theta}_{\mathfrak{N}}^{j}\}. When all sub-trajectories are optimized in iSAGO.reTrajIter, iSAGO will update the BT and 𝚯𝔑\bm{\Theta}_{\mathfrak{N}} for the incremental optimization.

IV-B Accelerated Gradient Descent

This paper gathers the accelerated gradient information for optimization because it only requires the first order momenta to achieve an optimal convergence [32] with the Lipschitz continuous gradient (8). So we adopt the descent rules of [19]:

αk=2k+1,βk=12,λk=kβk2,\alpha_{k}=\frac{2}{k+1},~{}\beta_{k}=\frac{1}{2\mathcal{L}_{\mathcal{F}}},~{}\lambda_{k}=\frac{k\beta_{k}}{2}, (11)

then for n>1\forall n>1, we have222See more details in [19]: (a) Lemma 1 provides an analytic view of the damped descent of Ghadimi’s AGD; (b) Theorem 1 provides AGD’s convergence rate, and its proof validates the accelerated descent process.

mink=1n¯(𝜽kmd)296𝜽𝜽02n2(n+1),\min_{k=1\dots n}\|\bar{\nabla}\mathcal{F}(\bm{\theta}^{md}_{k})\|^{2}\leq\frac{96\mathcal{L}_{\mathcal{F}}\|\bm{\theta}^{*}-\bm{\theta}_{0}\|^{2}}{n^{2}(n+1)}, (12)

Some former studies [18, 32, 19] set a fixed Lipschitz constant \mathcal{L}_{\mathcal{F}} for AGD. However, (12) indicates a strong correlation between the super-linear convergence rate and \mathcal{L}_{\mathcal{F}}. Recent studies [20, 33] introduce the restart schemes for speed-up when the problem is not globally convex. Algorithm 2 adopts the trust-region method [15] to adjust \mathcal{L}_{\mathcal{F}} with factor κ\kappa_{\tiny{\mathcal{L}}} and restart AGD until the condition333This paper uses kmd\mathcal{F}^{md}_{k}, ¯kmd\bar{\nabla}\mathcal{F}^{md}_{k}, kag\mathcal{F}^{ag}_{k} and k\mathcal{F}_{k} to simplify (𝜽kmd)\mathcal{F}(\bm{\theta}^{md}_{k}), ¯(𝜽kmd)\bar{\nabla}\mathcal{F}(\bm{\theta}^{md}_{k}), (𝜽kag)\mathcal{F}(\bm{\theta}^{ag}_{k}) and (𝜽k)\mathcal{F}(\bm{\theta}_{k}), while ibt\mathcal{F}^{\text{bt}}_{i} for the BT-factor at point θi\theta_{i}.

|kk1¯kmd,𝜽k𝜽k1|2𝜽k𝜽k12|\mathcal{F}_{k}-\mathcal{F}_{k-1}-\langle\bar{\nabla}\mathcal{F}_{k}^{md},\bm{\theta}_{k}-\bm{\theta}_{k-1}\rangle|\leq\tfrac{\mathcal{L}_{\mathcal{F}}}{2}\|\bm{\theta}_{k}-\bm{\theta}_{k-1}\|^{2} (13)

is satisfied. In this way, we arbitrarily initialize \mathcal{L}_{\mathcal{F}} as δ0¯(𝜽0)\updelta_{0}\|\bar{\nabla}{\mathcal{F}}(\bm{\theta}_{0})\| and update it for a robust convergence.

Input : initial 𝜽0\bm{\theta}_{0}, penalty ϱ{\varrho}, and 𝒢𝒫(𝝁,𝓚)\mathcal{GP}(\bm{\mu},\bm{\mathcal{K}}).
Output : optimized 𝜽md\bm{\theta}^{md}.
1 Initialize: 𝜽0ag=𝜽0\bm{\theta}_{0}^{ag}=\bm{\theta}_{0}, ¯1md=¯0\bar{\nabla}\mathcal{F}_{1}^{md}=\bar{\nabla}\mathcal{F}_{0}, =δ0¯0\mathcal{L}_{\mathcal{F}}=\updelta_{0}\|\bar{\nabla}{\mathcal{F}}_{0}\|;
2 for Iter:j=1N\textit{$\mathcal{L}$Iter}:j=1\dots N_{\mathcal{L}} do // adjust \mathcal{L}_{\mathcal{F}}
3       for AgdIter:k=1Nag\textit{AgdIter}:k=1\dots N_{ag} do // AGD
4             𝜽kmd=(1αk)𝜽k1ag+αk𝜽k1\bm{\theta}_{k}^{md}=\left(1-\alpha_{k}\right)\bm{\theta}_{k-1}^{ag}+\alpha_{k}\bm{\theta}_{k-1};
5             if k2k\geq 2 then  ¯kmd¯(𝜽kmd)\bar{\nabla}\mathcal{F}_{k}^{md}\leftarrow\bar{\nabla}\mathcal{F}(\bm{\theta}_{k}^{md});
6             Update αk\alpha_{k}, βk\beta_{k} and λk\lambda_{k} by (11);
7             𝜽k=𝜽k1λk¯kmd\bm{\theta}_{k}=\bm{\theta}_{k-1}-\lambda_{k}\bar{\nabla}\mathcal{F}_{k}^{md};
8             𝜽kag=𝜽kmdβk¯kmd\bm{\theta}_{k}^{ag}=\bm{\theta}_{k}^{md}-\beta_{k}\bar{\nabla}\mathcal{F}_{k}^{md};
9             if Converged (14) or isStuck then  return.
10             if condition (13) is not satisfied then
11                   {𝜽0md,𝜽0ag,𝜽0}𝜽k1md\left\{\bm{\theta}^{md}_{0},\bm{\theta}^{ag}_{0},\bm{\theta}_{0}\right\}\leftarrow\bm{\theta}^{md}_{k-1}; κ\mathcal{L}_{\mathcal{F}}\leftarrow\kappa_{\tiny{\mathcal{L}}}\cdot\mathcal{L}_{\mathcal{F}}; ¯1md¯k1md\bar{\nabla}\mathcal{F}_{1}^{md}\leftarrow\bar{\nabla}\mathcal{F}_{k-1}^{md}; break
12            
13      
Algorithm 2 AGD

The AGD in Algorithm 2 executes until it converges

|kk1|<tol,𝜽k𝜽k1<θtol|\mathcal{F}_{k}-\mathcal{F}_{k-1}|<\mathcal{F}\text{tol},~{}\|\bm{\theta}_{k}-\bm{\theta}_{k-1}\|<\theta\text{tol} (14)

or detects the stuck case according to Section IV-C1.

IV-C Stochastic Optimization with Momenta Adaptation

Some studies of deep learning [34, 24, 23, 35] propose stochastic gradient descent (SGD) for learning objects with noisy or sparse gradients. They optimize the objective function consisting of sub-functions in low coupling. Because of the higher robustness of Adam [23] during the function variation, we adopt its moment adaptation method and propose STOMA (Algorithm 3, Figure 2(b)) in the stuck case. This section will first illustrate how to generate a stochastic gradient (SG), detect the stuck case, and introduce how to accelerate SGD with momentum adaptation.

Input : initial 𝜽0\bm{\theta}_{0}, ϱ\varrho and 𝒢𝒫(𝝁,𝓚)\mathcal{GP}(\bm{\mu},\bm{\mathcal{K}}).
Output : optimized 𝜽sg\bm{\theta}^{sg}.
1
2for reSgdIter: j=1Nrsgj=1\dots N_{\textit{rsg}} do // restart SGD
3       Sample 𝚯K=𝜽0{𝜽k𝒢𝒫(𝝁,𝓚)|k=1K}\bm{\Theta}_{K}=\bm{\theta}_{0}\bigcup\{\bm{\theta}_{k}\sim\mathcal{GP}\left(\bm{\mu},\bm{\mathcal{K}}\right)\textbar_{k=1\dots K}\};
4       Evaluate costs 𝔉K={[𝜽k]|k=0,1,,K}\mathfrak{F}_{K}=\{\mathcal{F}[\boldsymbol{\theta}_{k}]\big{|}_{k=0,1,\dots,K}\};
5       Initialize k=0k=0, {𝜽0,𝜽0ag}argmin𝜽𝚯K𝔉K\{\bm{\theta}_{0},\bm{\theta}_{0}^{ag}\}\leftarrow\operatorname*{arg\!\min}_{\bm{\theta}\in\bm{\Theta}_{K}}\mathfrak{F}_{K};
6       while True do // adaptive SGD (SgdIter)
7             kk+1k\leftarrow k+1; Update Nsg𝒰(Nlosg,Nupsg)N_{\textit{sg}}\backsim\mathcal{U}(N_{lo}^{\textit{sg}},N_{up}^{\textit{sg}});
8             𝜽ksg=(1αk)𝜽k1ag+αk𝜽k1\bm{\theta}_{k}^{sg}=\left(1-\alpha_{k}\right)\bm{\theta}_{k-1}^{ag}+\alpha_{k}\bm{\theta}_{k-1};
9             Get ¯ksg\bar{\nabla}\mathcal{F}^{sg}_{k} by (15); check the Stuck-case;
10             if not isStuck  then  return.
11             if kNsgk\geq N_{\textit{sg}} or 𝛉k𝛉k1SGtol\|\bm{\theta}_{k}-\bm{\theta}_{k-1}\|\leq\textit{SGtol} then
12                   Update 𝜽0=𝜽ksg\bm{\theta}_{0}=\bm{\theta}_{k}^{sg}; break
13            Estimate the 2nd2^{nd} moment 𝕸k{\bm{\mathfrak{M}}}_{k} via (18);
14             Update αk\alpha_{k}, 𝕭k\bm{\mathfrak{B}}_{k} and λk\lambda_{k} via (19);
15             𝜽k=𝜽k1λk¯ksg\bm{\theta}_{k}=\bm{\theta}_{k-1}-\lambda_{k}\bar{\nabla}\mathcal{F}^{sg}_{k};
16             𝜽kag=𝜽ksg𝕭k¯ksg\bm{\theta}_{k}^{ag}=\bm{\theta}_{k}^{sg}-\bm{\mathfrak{B}}_{k}\bar{\nabla}\mathcal{F}^{sg}_{k};
17            
18      
Algorithm 3 STOMA
Refer to caption
(a) A series of frames illustrate how the human arm reacts to obstacles when grasping inside a bookshelf. The green, orange, and red balls show none, low, and high-risk collision areas.
Refer to caption
(b) STOMA drags the robot-arm out of the body-obstacle interference via adaptive stochastic descent. The green CCBs are selected for stochastic momenta (green arrow), while the red ones are abandoned.
Figure 2: The human-arm avoidance stimulated by the forces from the subparts inspires the robot-arm avoidance converged by stochastic gradient descent.

IV-C1 SG generation and Stuck case check

Since (6) reveals that the whole objective functional uses the penalty factor ϱ\varrho to gather the GP prior and obstacle information which accumulates the sub-functions calculated by collision-check balls (CCBs) in discrete time, we generate an SG

¯sg=ϱ^¯gp+¯obssg\bar{\nabla}\mathcal{F}^{sg}=\hat{\varrho}\bar{\nabla}\mathcal{F}_{gp}+\bar{\nabla}{\mathcal{F}_{\textit{obs}}^{sg}} (15)

in the Functional, Time, and Space scales:

(i) Functional: We select the penalty factor 1ϱ^𝒰(0,1ϱ)\frac{1}{\hat{\varrho}}\backsim\mathcal{U}(0,\frac{1}{\varrho}).

(ii) Time: We select a set {ntip𝒰(0,Nip)|t𝐓}\{n_{t}^{\textit{ip}}\backsim\mathcal{U}(0,N^{\textit{ip}})\big{|}_{t\in\mathbf{T}}\} whose element is the number of intervals of two support waypoints [θt,θt+1][\theta_{t}^{\top},\theta_{t+1}^{\top}]^{\top} to gain a sub-gradient for collision-check:

¯obssg=𝐌𝐠up,\bar{\nabla}{\mathcal{F}_{\textit{obs}}^{sg}}=\mathbf{M}^{\top}\cdot\mathbf{g}_{up}, (16)

where (5) generates the upsampling matrix 𝐌\mathbf{M} with the set {ntip|t𝐓}\{n_{t}^{\textit{ip}}\big{|}_{t\in\mathbf{T}}\} to map the randomly upsampled gradient

𝐠up=[obs𝓑[θ(t1)],obs𝓑[θ(t1+1nt1ip+1)],,obs𝓑[θ(t1+1)],obs𝓑[θ(t2nt2-1ipnt2-1ip+1)],,obs𝓑[θ(t2)]]\small\mathbf{g}_{up}=\left[\begin{array}[]{cccccc}\nabla^{\top}\mathcal{F}_{\textit{obs}}^{\bm{\mathcal{B}}}\left[{\theta}({t_{1}})\right],\\ \nabla^{\top}\mathcal{F}_{\textit{obs}}^{\bm{\mathcal{B}}}\left[{\theta}(t_{1}+\frac{1}{n_{t_{1}}^{\textit{ip}}+1})\right],\dots,\nabla^{\top}\mathcal{F}_{\textit{obs}}^{\bm{\mathcal{B}}}\left[{\theta}({t_{1}}+1)\right],\\ \cdots\\ \nabla^{\top}\mathcal{F}_{\textit{obs}}^{\bm{\mathcal{B}}}\left[{\theta}({t_{2}-\frac{n_{t_{2}\text{-}1}^{\textit{ip}}}{n_{t_{2}\text{-}1}^{\textit{ip}}+1}})\right],\dots,\nabla^{\top}\mathcal{F}_{\textit{obs}}^{\bm{\mathcal{B}}}\left[{\theta}(t_{2})\right]\end{array}\right]^{\top}

into the gradient ¯obssg\bar{\nabla}\mathcal{F}_{\textit{obs}}^{sg} of 𝜽𝐓\bm{\theta}_{\mathbf{T}} with 𝐓={t1,,t2}\mathbf{T}=\{t_{1},\dots,t_{2}\}.

(iii) Space: The intuition of the Space rule comes from the collision avoidance of a human arm on a bookshelf in Figure 2(a). When a participant stretches for an object, he makes sequential reflexes, stimulated by the exterior forces (orange arrow) affecting the danger parts (orange/red balls), in response to the body-obstacle collision. It indicates that the collision risk rises from shoulder to hand because the number of orange/red balls rises from shoulder to hand. So we first reconstruct each rigid body of a tandem manipulator by the geometrically connected CCBs. Next, we build each sub-problem (i.e., the collision avoidance of a single CCB) from shoulder to end-effector, considering the risk variation. Then obs𝓑{\nabla}\mathcal{F}^{\bm{\mathcal{B}}}_{\textit{obs}} is calculated by accumulating the CCB-gradients from 1\mathcal{B}_{1} nearby shoulder up to k\mathcal{B}_{k} nearby end-effector:

obs𝓑(θt)=obs|𝓑|(θt),obsk=i𝓑kϕiϕtol𝒒iobs,𝓑k={1,,k},\begin{array}[]{c}\nabla\mathcal{F}_{obs}^{\bm{\mathcal{B}}}\left({\theta}_{t}\right)=\nabla\mathcal{F}_{obs}^{|\bm{\mathcal{B}}|}\left({\theta}_{t}\right),\\[8.0pt] \nabla\mathcal{F}_{obs}^{k}=\sum_{\mathcal{B}_{i}\in\bm{\mathcal{B}}_{k}\atop\phi_{i}\leq\phi_{\text{tol}}}\nabla_{\bm{q}_{i}}\mathcal{F}_{obs},~{}\bm{\mathcal{B}}_{k}=\{\mathcal{B}_{1},\dots,\mathcal{B}_{k}\},\end{array} (17)

where 𝒒i\bm{q}_{i} contains the joints actuating CCB-i\mathcal{B}_{i}, and ϕi\phi_{i} is the included angle between i\mathcal{B}_{i}’s gradient 𝒒iobs{\nabla}_{\bm{q}_{i}}\mathcal{F}_{obs} and the accumulated gradient obsi1{\nabla}\mathcal{F}_{obs}^{i-1} (from 1\mathcal{B}_{1} to i1\mathcal{B}_{i-1}). The random tolerance angle ϕtol\phi_{\text{tol}} generates SG in the Space scale by the random rejection of 𝒒iobs\nabla_{\bm{q}_{i}}\mathcal{F}_{obs} with ϕi>ϕtol\phi_{i}>\phi_{\text{tol}}, meaning that the space stochasticity will vanish when ϕtol180°\phi_{\text{tol}}\rightarrow 180\degree.

(iv) Stuck case: Though, unlike the robot-arm, dragged out by the SG (Figure 2(b)), the stuck case cannot occur during the human-arm avoidance (Figure 2(a)), we can still imagine how confused a human-arm reflex is when a pair of opposite forces stimulate it. So we select a constant ϕtol\phi_{\text{tol}} to detect the stuck case confusing the collision-free planning. Figure 3 visualizes how an arm is stuck in an obstacle when ϕi>ϕtol\exists\phi_{i}>\phi_{\text{tol}} and how a sub-trajectory containing a series of arms is stuck. We do not adopt the pair-wise check because the risk variates among different parts, and it is more efficient for the stuck check by a constant ϕtol\phi_{\text{tol}} to synchronize with the SG generation by a random ϕtol\phi_{\text{tol}}.

Refer to caption
Figure 3: The grey CCBs assemble an LBR-iiwa, while the red denotes an in-collision case. The stuck case occurs when the included angle ϕ>ϕtol\phi>\phi_{\text{tol}}.

IV-C2 Update rules

Adam [23] adopts the piecewise production of gradients and uses EMA from the initial step up to the current step and the bias correction method for the adaptive momentum estimation. Our work uses the squared 2\ell_{2}-norm of SG to estimate 𝕸^1=¯1sg2\hat{\bm{\mathfrak{M}}}_{1}=\|\bar{\nabla}\mathcal{F}^{sg}_{1}\|^{2} and adopt EMA and bias correction for 2nd-momenta {𝕸k}\{\bm{\mathfrak{M}}_{k}\} estimation:

𝕸^k=γ𝕸^k1+(1γ)¯ksg2,𝕸k=𝕸^k1γk.\hat{\bm{\mathfrak{M}}}_{k}=\gamma\hat{\bm{\mathfrak{M}}}_{k-1}+(1-\gamma)\bar{\nabla}{\mathcal{F}^{sg}_{k}}^{\odot 2},~{}\bm{\mathfrak{M}}_{k}=\frac{\hat{\bm{\mathfrak{M}}}_{k}}{1-\gamma^{k}}. (18)

As shown in Algorithm 3, we adopt the AGD rules [19] rather than the EMA with bias correction for first-momentum adaption. {αk,𝕭k,λk}\{\alpha_{k},\bm{\mathfrak{B}}_{k},\lambda_{k}\}’s update rules are also reset as

αk=2k+1,𝕭k=𝕸k122δ,λk|𝕭k|[1,1+αk4].\alpha_{k}=\frac{2}{k+1},~{}\bm{\mathfrak{B}}_{k}=\frac{\bm{\mathfrak{M}}_{k}^{-\frac{1}{2}}}{2\delta},~{}\frac{\lambda_{k}}{|\bm{\mathfrak{B}}_{k}|}\in\left[1,1+\frac{\alpha_{k}}{4}\right]. (19)

In this way, the descent process, approximately bounded by a δ\delta-sized trust region, accelerates via the α\alpha-linear interpolation between the conservative step and the accelerated step driven by {λk}\{\lambda_{k}\} and {𝕭k}\{\bm{\mathfrak{B}}_{k}\} correspondingly. Then the process varies from the lag state guided by 𝜽\bm{\theta} to the shifting one guided by 𝜽ag\bm{\theta}^{ag} with {αk}\{\alpha_{k}\} value. Since the SGD is roughly Lipschitz continuous, we get

k1\displaystyle\mathcal{F}_{k-1}\leq k+¯k1,𝜽k𝜽k1\displaystyle\mathcal{F}_{k}+\left\langle\bar{\nabla}\mathcal{F}_{k-1},\bm{\theta}_{k}-\bm{\theta}_{k-1}\right\rangle
\displaystyle\leq k+λk¯ksg2+λkΔk¯ksg,\displaystyle\mathcal{F}_{k}+\lambda_{k}\|\bar{\nabla}\mathcal{F}_{k}^{sg}\|^{2}+\lambda_{k}\left\|\Delta_{k}\right\|\left\|\bar{\nabla}\mathcal{F}_{k}^{sg}\right\|,

where k\mathcal{F}_{k} and ksg\mathcal{F}_{k}^{sg} simplify (𝜽k)\mathcal{F}(\bm{\theta}_{k}) and (𝜽ksg)\mathcal{F}(\bm{\theta}_{k}^{sg}), and

Δk=¯k1¯ksgsg(1αk)𝜽k1ag𝜽k1,\displaystyle\left\|\Delta_{k}\right\|=\|\bar{\nabla}\mathcal{F}_{k-1}-\bar{\nabla}\mathcal{F}_{k}^{sg}\|\leq\mathcal{L}_{\mathcal{F}}^{sg}(1-\alpha_{k})\|\bm{\theta}_{k-1}^{ag}-\bm{\theta}_{k-1}\|,
𝜽kag𝜽k=(1αk)(𝜽k1ag𝜽k1)+(λk𝕭k)¯ksg\displaystyle\bm{\theta}_{k}^{ag}-\bm{\theta}_{k}=\left(1-\alpha_{k}\right)\left(\bm{\theta}_{k-1}^{ag}-\bm{\theta}_{k-1}\right)+\left(\lambda_{k}-\bm{\mathfrak{B}}_{k}\right)\bar{\nabla}\mathcal{F}_{k}^{sg}

according to lines 3, 3, 3 of Algorithm 3. Combining the Cauchy-Schwarz inequation, we get

k1\displaystyle\mathcal{F}_{k-1}\leq k+λk(1+sgλk2)¯ksg2\displaystyle\mathcal{F}_{k}+\lambda_{k}\left(1+\tfrac{\mathcal{L}_{\mathcal{F}}^{sg}\lambda_{k}}{2}\right)\left\|\bar{\nabla}\mathcal{F}_{k}^{sg}\right\|^{2}
+sg(1αk)22𝜽k1ag𝜽k12.\displaystyle+\tfrac{\mathcal{L}_{\mathcal{F}}^{sg}\left(1-\alpha_{k}\right)^{2}}{2}{\left\|\bm{\theta}_{k-1}^{ag}-\bm{\theta}_{k-1}\right\|^{2}}.

Then through the accumulation from k=1k=1 to NN, we get

0Nsgk=1N(λk+λk22+(λkβk)22Γkαkl=kNΓl)¯ksg2\begin{array}[]{l}\frac{\mathcal{F}_{0}-\mathcal{F}_{N}}{\mathcal{L}_{\mathcal{F}}^{sg}}\leq\sum_{k=1}^{N}\left(\lambda_{k}+\frac{\lambda_{k}^{2}}{2}+\frac{\left(\lambda_{k}-\beta_{k}\right)^{2}}{2\Gamma_{k}\alpha_{k}}\sum_{l=k}^{N}\Gamma_{l}\right)\left\|\bar{\nabla}\mathcal{F}_{k}^{sg}\right\|^{2}\end{array}

where βk=|𝕭k|\beta_{k}=|\bm{\mathfrak{B}}_{k}|_{\infty^{-}} and Γl=(1αl)Γl1\Gamma_{l}=\left(1-\alpha_{l}\right)\Gamma_{l-1} with Γ0=1\Gamma_{0}=1.

In sight of the stochasticity of (𝜽sg)\mathcal{F}(\bm{\theta}^{sg}) and its gradient ¯sg\bar{\nabla}\mathcal{F}^{sg} and presuming ¯kclδsg\|\bar{\nabla}\mathcal{F}_{k}\|c_{l}\cdot\delta\geq\mathcal{L}_{\mathcal{F}}^{sg}, we get

0NN5𝐆4δ(1+11cl32)ln(N+1)N,\frac{\mathcal{F}_{0}-\mathcal{F}_{N}}{N}\leq\frac{5\mathbf{G}_{\infty}}{4\delta}\left(1+\frac{11c_{l}}{32}\right)\frac{\ln(N+1)}{N}, (20)

where clc_{l} is the fraction of sg\mathcal{L}^{sg}_{\mathcal{F}} and trust region box δ¯k\delta\cdot\|\bar{\nabla}\mathcal{F}_{k}\|, and 𝐆𝔼([¯1sg,,¯Nsg])\mathbf{G}_{\infty}\leq\mathds{E}(\|[\bar{\nabla}^{\top}\mathcal{F}_{1}^{sg},\dots,\bar{\nabla}^{\top}\mathcal{F}_{N}^{sg}]\|_{\infty}).

Figure 2(b) illustrates how a robot reflexes to the stuck case like the human in Figure 2(a), following the update rules in SgdIter. To improve the efficiency and reliability, we nest the above SGD (i.e., SgdIter) in reSgdIter. Once the SGD is restarted, we select the trajectory with the lowest cost from the sample set 𝚯K\bm{\Theta}_{K} to reinitialize the next SGD, whose maximum number NsgN_{\textit{sg}} is selected randomly in [Nlosg,Nupsg][N_{lo}^{\textit{sg}},N_{up}^{\textit{sg}}].

V EXPERIMENT

This section will first introduce the objective functional implementation in Section V-A. Then it will detail the benchmark and parameter setting in Sections V-B1 & V-B2 and analyze the benchmark results in Section V-B3.

V-A Implementation details

V-A1 GP prior

We assume the velocity is constant and gain the GMM and GP-prior (1) to reduce the dimension. Our benchmark uses N=12N=12 support states with Nip=8N^{\textit{ip}}=8 intervals (116 states in total) for iSAGO and GPMP2, 116 states for STOMP and CHOMP, 12 and 58 states for TrajOpt444Since TrajOpt proposes a swept-out area between support waypoints for continuous-time safety, we choose TrajOpt-12 for safety validation and TrajOpt-58 to ensure the dimensional consistency of the benchmark. , and MaxConnectionDistance =θgθ012=\frac{\|\theta_{g}-\theta_{0}\|}{12} for RRT-Connect. All implementations except RRT-Connect apply linear interpolation rather than manual selection to initialize trajectory for a fair comparison.

V-A2 Collision cost function

We adopt the polynomial piecewise form of [1] for the collision cost c(𝒙)c(\bm{x}) of (2).

V-A3 Motion constraints

The motion cost function in GPMP [2] is adopted to drive the robot under the preplanned optimal collision-free trajectory in the real world.

V-B Evaluation

V-B1 Setup for benchmark

This paper benchmarks iSAGO against the numerical planners (CHOMP [1], TrajOpt [3], GPMP2 [2]) and the sampling planners (STOMP [8], RRT-Connect [7]) on a 7-DoF robot (LBR-iiwa) and a 6-DoF robot (AUBO-i5). Since the benchmark executes in MATLAB, we use BatchTrajOptimize3DArm of GPMP2-toolbox to implement GPMP2, plannarBiRRT with 20s maximum time for RRT-Connect, fmincon555We use optimoptions(’fmincon’,’Algorithm’,’trust-region-reflective’,’SpecifyObjectiveGradient’, true) to apply fmincon for the trust-region method like TrajOpt and calculates ObjectiveGradient analytically and Aeq (a matrix whose rows are the constraint gradients) by numerical differentiation. for TrajOpt, hmcSampler666Our benchmark defines an hmcSampler object whose logarithm probabilistic density function logpdf is defined by (6), uses hmcSampler.drawSamples for HMC adopted by CHOMP. for CHOMP, and mvnrnd777STOMP samples the noise trajectories by mvnrnd and updates the trajectory via projected weighted averaging. for STOMP. Since all of them are highly tuned in their own studies, our benchmark uses their default settings.

To illustrate the competence of iSAGO for planning tasks, we conduct 25 experiments on LBR-iiwa at a bookshelf and 12 experiments on AUBO-i5 at a storage shelf. We categorize all tasks into 3 classes (A, B, and C) whose planning difficulty rises with the stuck cases in the initial trajectory increase. Figure 4 visually validates our classification because the number of red in-collision CCBs increases from Task A-1 to Task C-3. Considering the difficulty of different classes, we first generate 2 tasks of class A, 3 tasks of class B, and 4 tasks of class C for LBR-iiwa (Figures 4(a)-4(c)). Then we generate 6 tasks of class C for AUBO-i5 (Figures 4(d)). Each task consists of 2 to 4 problems888Since LBR-iiwa has 7 DoFs, meaning infinite solutions for the same goal constraint, our benchmark uses LM [14] with random initial points to generate different goal states. So one planning task of LBR-iiwa with the same goal constraint has several problems with different goal states. Meanwhile, each task of AUBO-i5 has two problems with the same initial state and goal constraint because the same goal constraint has only one solution for the 6-DoF, and we switch yaw-angle in {0°,180°}\{0\degree,180\degree\} for each. with the same initial state and goal constraint. Moreover, we compare iSAGO with SAGO to show the efficiency of the incremental method and compare SAGO with STOMA or AGD to show the efficiency or reliability of the mixed optimization.

Refer to caption
(a) Task A-1, 6 red CCBs
Refer to caption
(b) Task B-1, 11 red CCBs
Refer to caption
(c) Task C-3, 22 red CCBs
Refer to caption
(d) Task C-6, 35 red CCBs
Figure 4: The initial trajectory with red collision parts visualizes our benchmark on LBR-iiwa or AUBO-i5. Task C-6 means the No.6 task of class C.
Refer to caption
(a) Trajectory C-3.2
Refer to caption
(b) Trajectory C-6.2
Figure 5: Some results of iSAGO in class C (Trajectory C-3.2 means the 2nd planning result of task 3 in class C. )

V-B2 Parameter setting

Since STOMA (Algorithm 3, Figure 2(b)) proceeds SGD roughly contained in a δ\delta-sized trust-region with γ\gamma-damped EMA of historical momenta, this section first tunes these two parameters, then the others.

According to abundant experiences, we tune the key parameters δ[0.04,0.80]\delta\in[0.04,0.80], γ[0.50,0.99]\gamma\in[0.50,0.99] and conduct 10 different class C tasks for each pair of parameters. Table I indicates that the SGD rate increases with δ\delta-expansion. In contrast, the success rate decreases with the excessive δ\delta (i.e., too high or too low). Moreover, the SGD rate increases with γ\gamma-shrink while the success rate decreases with γ\gamma-shrink. All in all, iSAGO performs more stable with a smaller step-size δ\delta, hampering local minima’s overcoming. Meanwhile, it combines less historical momenta to perform faster, reducing the reliability. So we set δ=0.40\delta=0.40 and γ=0.90\gamma=0.90.

TABLE I: Results for {δ,γ}\{\delta,\gamma\} tuning of 10 tasks in class C
γ\gamma δ\delta (%)|(s)
0.80 0.40 0.08 0.04
0.50 80 | 1.253 90 | 1.898 60 | 4.198 40 | 6.309
0.90 90 | 2.512 100 | 2.463 80 | 4.973 60 | 8.219
0.99 90 | 2.672 100 | 3.089 80 | 7.131 60 | 12.57
  • 1

    (%) and (s) denote the success rate and average computation time.

Besides δ\delta and γ\gamma, we set the initial penalty ϱ=1.25e-2\varrho=1.25\text{e-2}, penalty factor κϱ=0.4\kappa_{\varrho}=0.4, and filter factor cη=2.0c_{\eta}=2.0 in Algorithm 1. In Algorithm 2, we set the tolerances tol=8e-4\mathcal{F}\text{tol}=8\text{e-4}, θtol=1e-3\theta\text{tol}=\text{1e-3}, obstol=1e-4\textit{obs}\text{tol}=\text{1e-4} and scaling factor κ=6.67\kappa_{\mathcal{L}}=6.67. In Algorithm 3, we set the number of samples K=12K=12, tolerance SGtol=6.4e-3\textit{SG}\text{tol}=6.4\text{e-3}, and the number of SgdIter Nsg𝒰(35,55)N_{\textit{sg}}\backsim\mathcal{U}(35,55). Moreover, we select ϕtol=95°\phi_{\text{tol}}=95\degree to check stuck case while ϕtol𝒰(60°,180°)\phi_{\text{tol}}\backsim\mathcal{U}(60\degree,180\degree) to generate SG.

V-B3 Result analysis

TABLE II: Results of 15 tasks (44 planning problems) with 5 repeated tests
Problem Our incremental mixed optimization numerical optimization probabilistic sampling
AGD STOMA SAGO iSAGO TrajOpt-12 TrajOpt-58 GPMP2-12 CHOMP STOMP RRT-Connect
Scr(%) iiwa_AB 75 97.5 100 100 27.5 77.5 81.25 87.5 82.5 100
iiwa_C 8.75 87.5 91.25 93.75 3.75 8.75 11.25 30 25 72.5
aubo_C 5 71.67 83.33 88.33 3.33 6.67 11.67 26.67 25 73.33
Avt(s) iiwa_AB 0.278 3.031 1.714 1.145 0.127 0.495 1.294 2.728 3.080 8.92
iiwa_C 0.288 7.622 4.284 2.132 0.232 1.108 2.464 6.989 7.691 13.65
aubo_C 0.343 8.584 4.812 2.413 0.245 1.119 2.610 7.588 8.640 16.02
Sdt(s) iiwa_AB 0.032 0.823 0.362 0.287 0.034 0.076 0.845 0.988 2.540 3.613
iiwa_C 0.045 1.735 1.048 0.319 0.055 0.101 1.312 1.290 5.233 4.562
aubo_C 0.078 1.821 1.101 0.334 0.057 0.106 1.377 1.354 5.494 3.974
  • 1

    Scr(%), Avt(s) and Sdt(s) denote the success rate, average computation time and standard deviation of computation time, respectively.

  • 2

    iiwa_AB, iiwa_C and aubo_C denote 16 class A&B problems on LBR-iiwa, 16 class C problems on LBR-iiwa, and 12 class C problems on AUBO-i5, respectively.

Figure 5(a) shows how iSAGO drags a series of in-stuck arms of Figure 4(c) out of the bookshelf by STOMA to grasp a cup located in the middle layer of the bookshelf safely. Meanwhile, Figure 5(b) shows how AUBO-i5 avoids the collision of Figure 4(d) to transfer the green piece between different cells of the storage shelf safely.

Table II shows iSAGO gains the highest success rate compared to the others. The random SgdIter number NsgN_{\textit{sg}} and AGD help iSAGO gain the fourth solving rate after TrajOpt-12, TrajOpt-58, and GPMP2-12. Though TrajOpt-58 compensates for the continuous safety information leakage of TrajOpt-12 in iiwa_AB, it still cannot escape from the local minima contained by the trust region and gains the second lowest success rate just above TrajOpt-12 in iiwa_C/aubo_C. In contrast, iSAGO successfully descends into an optimum with adaptive stochastic momenta and an appropriate initial trust-region. Thanks to HMC, randomly gaining the Hamiltonian momenta, CHOMP approaches the optimum with the third-highest success rate, whose failures are informed by the deterministic rather than the stochastic gradients. RRT-Connect with a limited time has the highest and the second-highest success rate in iiwa_AB and iiwa_C/aubo_C, respectively. However, a higher rate needs a smaller connection distance which restricts the RRT growth and computation efficiency. Though STOMP is free of gradient calculation, the significant time it takes to resample does a minor effect on feasible searching, mainly limited by the Gauss kernel. As for GPMP2-12 and TrajOpt-58, the LM and trust-region methods help approach the stationary point rapidly. In contrast, the point of iiwa_C/aubo_C has significantly lower feasibility than that in iiwa_A because the initial trajectory of iiwa_C/aubo_C gets stuck deeper.

The comparison between STOMA and SAGO in Table II shows how AGD performs a 45% accelerated descent towards an optimum. The comparison between iSAGO and SAGO indicates a 55% higher efficiency of incremental planning. The 90% lower success rate of AGD than SAGO shows the limitation of the numerical method. It validates that STOMA can modify the manifold with less local minima by randomly selecting sub-functional.

The comparison between iiwa_C and aubo_C in Tables II shows that iSAGO performs better when the feasible solution is nearby the motion constraints. That is because the motion constraint somewhat stabilizes the δ\delta-sized SGD of STOMA. So the large δ\delta-sized steps bring a severe disturbance and weaken STOMA’s performance by 20% when obstacles wrap the feasible solution. Meanwhile, the comparison between iiwa_AB and iiwa_C shows that the stuck case reduces the success rate of the numerical method by 90% and validates iSAGO’s higher reliability in a narrow workspace under the same non-manual initial condition.

VI CONCLUSIONS

iSAGO utilizes the mixed information of accelerated (AGD) and stochastic (STOMA) momentum to overcome the body-obstacle stuck cases in a narrow workspace.

(i) STOMA performs an adaptive stochastic descent to avoid the local minima confronted by the numerical methods, and the results show the highest success rate among them.

(ii) AGD integrated by iSAGO accelerates the descent informed by the first-order momenta. The results show it saves the sampling method’s computation resources and gain the fourth solving rate.

(iii) The incremental planning optimizes the sub-planning to elevate the whole-planning rate further. The results show the 55% higher efficiency of iSAGO than SAGO.

References

  • [1] M. Zucker, N. Ratliff, A. D. Dragan, M. Pivtoraiko, M. Klingensmith, C. M. Dellin, J. A. Bagnell, and S. S. Srinivasa, “Chomp: Covariant hamiltonian optimization for motion planning,” The International Journal of Robotics Research, vol. 32, no. 9-10, pp. 1164–1193, 2013.
  • [2] M. Mukadam, J. Dong, X. Yan, F. Dellaert, and B. Boots, “Continuous-time gaussian process motion planning via probabilistic inference,” The International Journal of Robotics Research, vol. 37, no. 11, pp. 1319–1340, 2018.
  • [3] J. Schulman, J. Ho, A. X. Lee, I. Awwal, H. Bradlow, and P. Abbeel, “Finding locally optimal, collision-free trajectories with sequential convex optimization.” in Robotics: science and systems, vol. 9, no. 1.   Citeseer, 2013, pp. 1–10.
  • [4] M. Bhardwaj, B. Boots, and M. Mukadam, “Differentiable gaussian process motion planning,” in 2020 IEEE International Conference on Robotics and Automation (ICRA), May 2020, pp. 10 598–10 604.
  • [5] L. E. Kavraki, M. N. Kolountzakis, and J. . Latombe, “Analysis of probabilistic roadmaps for path planning,” IEEE Transactions on Robotics and Automation, vol. 14, no. 1, pp. 166–171, Feb 1998.
  • [6] S. M. LaValle et al., “Rapidly-exploring random trees: A new tool for path planning,” 1998.
  • [7] J. J. Kuffner and S. M. LaValle, “Rrt-connect: An efficient approach to single-query path planning,” in Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065), vol. 2, April 2000, pp. 995–1001 vol.2.
  • [8] M. Kalakrishnan, S. Chitta, E. Theodorou, P. Pastor, and S. Schaal, “Stomp: Stochastic trajectory optimization for motion planning,” in 2011 IEEE International Conference on Robotics and Automation, May 2011, pp. 4569–4574.
  • [9] S. Karaman and E. Frazzoli, “Sampling-based algorithms for optimal motion planning,” The international journal of robotics research, vol. 30, no. 7, pp. 846–894, 2011.
  • [10] P. Rajendran, S. Thakar, A. M. Kabir, B. C. Shah, and S. K. Gupta, “Context-dependent search for generating paths for redundant manipulators in cluttered environments,” in 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019, pp. 5573–5579.
  • [11] C. Park, J. Pan, and D. Manocha, “Itomp: Incremental trajectory optimization for real-time replanning in dynamic environments,” in Proceedings of the Twenty-Second International Conference on International Conference on Automated Planning and Scheduling, ser. ICAPS’12.   AAAI Press, 2012, pp. 207–215.
  • [12] K. Shirley, “Inference from simulations and monitoring convergence,” Handbook of Markov Chain Monte Carlo, May 2011.
  • [13] M. Kaess, H. Johannsson, R. Roberts, V. Ila, J. J. Leonard, and F. Dellaert, “isam2: Incremental smoothing and mapping using the bayes tree,” The International Journal of Robotics Research, vol. 31, no. 2, pp. 216–235, 2012.
  • [14] K. Levenberg, “A method for the solution of certain non-linear problems in least squares,” Quarterly of Applied Mathematics, vol. 2, no. 2, pp. 164–168, 1944.
  • [15] R. H. Byrd, J. C. Gilbert, and J. Nocedal, “A trust region method based on interior point techniques for nonlinear programming,” Mathematical programming, vol. 89, no. 1, pp. 149–185, 2000.
  • [16] J. Ichnowski, Y. Avigal, V. Satish, and K. Goldberg, “Deep learning can accelerate grasp-optimized motion planning,” Science Robotics, vol. 5, no. 48, p. eabd7710, 2020.
  • [17] A. Kuntz, C. Bowen, and R. Alterovitz, “Fast anytime motion planning in point clouds by interleaving sampling and interior point optimization,” in Robotics Research, N. M. Amato, G. Hager, S. Thomas, and M. Torres-Torriti, Eds.   Cham: Springer International Publishing, 2020, pp. 929–945.
  • [18] Y. Nesterov, “A method for unconstrained convex minimization problem with the rate of convergence o (1/k^ 2),” in Doklady an ussr, vol. 269, 1983, pp. 543–547.
  • [19] S. Ghadimi and G. Lan, “Accelerated gradient methods for nonconvex nonlinear and stochastic programming,” Mathematical Programming, vol. 156, no. 1-2, pp. 59–99, 2016.
  • [20] W. Su, S. Boyd, and E. Candes, “A differential equation for modeling nesterov’s accelerated gradient method: Theory and insights,” in Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, Eds.   Curran Associates, Inc., 2014, pp. 2510–2518.
  • [21] A. C. Wilson, B. Recht, and M. I. Jordan, “A lyapunov analysis of momentum methods in optimization,” 2018.
  • [22] L. Bottou, “Large-scale machine learning with stochastic gradient descent,” in Proceedings of COMPSTAT’2010, Y. Lechevallier and G. Saporta, Eds.   Heidelberg: Physica-Verlag HD, 2010, pp. 177–186.
  • [23] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  • [24] H. Geoffrey, N. Srivastava, and K. Swersky, “Neural networks for machine learning lecture 6a overview of mini-batch gradient descent,” Department of Computer Science, University of Toronto, Tech. Rep. 14, 02 2012.
  • [25] X. Wu and M. W. Mueller, “In-flight range optimization of multicopters using multivariable extremum seeking with adaptive step size,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 1545–1550.
  • [26] K. Solovey, O. Salzman, and D. Halperin, “New perspective on sampling-based motion planning via random geometric graphs,” The International Journal of Robotics Research, vol. 37, no. 10, pp. 1117–1133, 2018.
  • [27] E. W. Dijkstra, “A note on two problems in connexion with graphs,” Numerische Mathematik, vol. 1, no. 1, pp. 269–271, 1959.
  • [28] A. G. Hofmann, E. Fernandez, J. Helbert, S. D. Smith, and B. C. Williams, “Reactive integrated motion planning and execution,” in Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015.
  • [29] L. Petrović, J. Peršić, M. Seder, and I. Marković, “Stochastic optimization for trajectory planning with heteroscedastic gaussian processes,” in 2019 European Conference on Mobile Robots (ECMR), 2019, pp. 1–6.
  • [30] T. Osa, “Multimodal trajectory optimization for motion planning,” The International Journal of Robotics Research, vol. 39, no. 8, pp. 983–1001, 2020.
  • [31] J. Nocedal and S. Wright, Numerical optimization.   Springer Science & Business Media, 2006.
  • [32] Y. Nesterov, Introductory lectures on convex optimization: A basic course.   Springer Science & Business Media, 2013, vol. 87.
  • [33] B. O’Donoghue and E. Candès, “Adaptive restart for accelerated gradient schemes,” Foundations of Computational Mathematics, vol. 15, no. 3, pp. 715–732, 2015.
  • [34] J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning and stochastic optimization,” J. Mach. Learn. Res., vol. 12, pp. 2121–2159, July 2011.
  • [35] I. Goodfellow, Y. Bengio, and A. Courville, Deep learning.   MIT press, 2016.