This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

MA-DV2F: A Multi-Agent Navigation Framework using Dynamic Velocity Vector Field

Yining Ma1,2, Qadeer Khan1,2 and Daniel Cremers1,2 1 Computer Vision Group, School of Computation, Information and Technology, Technical University of Munich. Contact: {yining.ma, qadeer.khan, cremers}@tum.de2 Munich Center for Machine Learning (MCML).
Abstract

In this paper we propose MA-DV2F: Multi-Agent Dynamic Velocity Vector Field. It is a framework for simultaneously controlling a group of vehicles in challenging environments. DV2F is generated for each vehicle independently and provides a map of reference orientation and speed that a vehicle must attain at any point on the navigation grid such that it safely reaches its target. The field is dynamically updated depending on the speed and proximity of the ego-vehicle to other agents. This dynamic adaptation of the velocity vector field allows prevention of imminent collisions. Experimental results show that MA-DV2F outperforms concurrent methods in terms of safety, computational efficiency and accuracy in reaching the target when scaling to a large number of vehicles. Project page for this work can be found here: https://yininghase.github.io/MA-DV2F/

I INTRODUCTION

The task of multi-agent navigation has attracted widespread attention in recent years due to myriad applications in areas such as search and rescue missions [1], area exploration [2], pickup and delivery services [3], warehouses [4], self-driving cars [5] etc. The task of multi-agent path finding/navigation involves simultaneously directing a group of vehicles from their initial position to their desired destination while avoiding collisions with other agents. The task is known to be NP-hard even in the discrete setting [6]. An ideal algorithm must find the optimal solution in limited time. This leads to contradictory goals, since determining the optimal solution requires searching a larger solution space, which necessitates more time. In structured environments such as indoor spaces, prior knowledge and understating of the layout impose constraints, that can reduce the solution search space. In unstructured environments, there are no such constraints. This allows an algorithm the flexibility to find a solution. However, since the search space is much larger, there is no guarantee that the solution found is optimal. The problem is further exacerbated when the search space is continuous and agents are non-holomonic vehicles. The constraints arising from the vehicle kinematics add to the complexity.

There have been various techniques and heuristics attempting to find (near-)optimal trajectories for multiple agents. The methods can be divided into two primary categories: 1) Learning based data driven methods [7, 8] and 2) Search/optimization based methods [9, 10]. Learning based algorithms involve training a neural network on data, with the understanding that the network will generalize at inference time. The training data should encompass all the situations that the model is expected to encounter at test time. This necessitates a large amount of training samples. The greatest challenge with large training samples is determining the corresponding supervised labels for training; that might be too tedious to obtain. In contrast to supervised learning, an alternate would be to train using reinforcement learning (RL) [11], where the model explores the environment and acquires rewards or penalties depending on the actions taken. The model then exploits this experience to execute the correct control actions at test time. However, RL algorithms tend to be more sample inefficient than supervised methods. In contrast, optimization [12] or search based [13] methods involves simultaneously optimizing trajectories for multiple vehicles. As the number of vehicles are added, the complexity of the optimization/search becomes intractable making it infeasible for scaling to a large number of vehicles [14, 15].

Refer to caption
Figure 1: Shows the pipeline for MA-DV2F. The state of all vehicles at time tt is used to create the dynamic velocity vector field (DV2F) from which the reference control commands are determined. These commands are in turn used to determine the next state at time t+1t+1 using the kinematic motion model. Note that the reference control commands can optionally be used to train a GNN in a self-supervised manner (indicated by dotted arrows). Meanwhile, the DV2F is shown for the black ego-vehicle with 2 other vehicles (blue & maroon) and 2 obstacles (green & brown) in the scene. The target state for the black ego-vehicle is shown by the dotted rectangle with a solid arrow at the top of the map. The dotted circle around the target state is the parking region. The black arrows indicate the ideal reference orientation as unit vectors which the ego-vehicle should attain at each different position on the map. The colored dotted circles around other vehicles and obstacles show the collision avoiding regions for the black ego-vehicle. Each black arrow in these regions is composed of an attractive target approaching component (gray arrow) and a repulsive collision avoidance component (colored arrow). Note that due to kinematic constraints, the reference orientation (๐ฎ๐ซ๐ž๐Ÿ\mathbf{u}_{\mathbf{ref}}) might not be attainable for the ego-vehicle at its existing location. The shaded wedge in front of the ego-vehicle shows the region of reachable orientation at the next time step. The real orientation (๐ฎ๐ซ๐ž๐š๐ฅ\mathbf{u}_{\mathbf{real}}) is therefore the physically attainable orientation by the ego-vehicle that is closest to the reference.

An example of the dynamic velocity vector field is shown in the project page: https://yininghase.github.io/MA-DV2F/#VD. Note that DV2F for the neighboring blue & maroon vehicles is likewise created separately (not shown here).

In this paper, we propose Multi-Agent Dynamic Velocity Vector Field (MA-DV2F), which generates vectors for the orientation and reference speed for every vehicle on the map. The vehicles then just need to follow in the direction of their respective velocity vector fields to successfully reach their destinations. The vector field for each vehicle is generated independently and can be adapted dynamically depending on the vehicleโ€™s proximity to other agents (neighbouring vehicles or obstacles). Decoupling reduces the complexity of adding vehicles & allows for parallel generation of DV2F of each vehicle, thereby increasing the throughput. An added benefit of our approach is that the generated DV2F can be used to train a learning based graphical neural network (GNN) in a self-supervised manner. This self-supervised learning based approach neither requires tedious labeling of data nor necessitates sample inefficient environment exploration. We test our framework under challenging collision prone environments. A scenario is regarded to be challenging if trajectories of different vehicles considered independently from other agents intersect at multiple places at the same time. This would necessitate a collision avoidance maneuver for safe navigation towards the target.

Fig. 1 shows the pipeline for both the MA-DV2F (left branch, solid arrows) and optional training of the the self-supervised GNN counterpart (right branch, dotted arrows). The input is a continuous state representation of all the multiple vehicles and the outputs are the corresponding continuous control variables. The vehicles are non-holonomic with rotation radius determined by the kinematic model.

We summarize the contribution of this letter as follows:

  • โ€ข

    Our proposed MA-DV2F outperforms other concurrent learning and search based approaches for the task of multi agent navigation in challenging, collision prone environments.

  • โ€ข

    Even the self-supervised learning based counterpart of MA-DV2F scales better than other learning and search based methods.

  • โ€ข

    MA-DV2F can determine the solutions orders of magnitude faster than other SOTA search based approaches.

  • โ€ข

    We shall release the complete code of MA-DV2F upon acceptance on the project page here: https://yininghase.github.io/MA-DV2F/.

The project page also contains additional supplementary information such videos better depicting the operation of our method in comparison with other approaches, details of the challenging scenarios, dynamics of the velocity field at different regions on the navigation grid etc.

II Related Work

[16] proposed using Artificial Potential Fields (APF) for trajectory planning. An agent is subjected to an attractive potential force towards the target which serves as the sink and a repulsive potential force away from obstacles [17, 18]. However, a common problem with such methods is their propensity to get stuck in local minima [19, 20] when the attractive force from the target is cancelled out by the repulsive force arising from an another agent for e.g. when the ego-vehicle is symmetrically aligned with other agents to the target [21] and thus leading to a bottleneck situation. We break such bottlenecks by enforcing the vehicles to move in the clockwise direction.

[13] proposed a two level tree based search algorithm for multi-agent path finding. However, the tree may grow exponentially, making the search inefficient. This is because multi-agent path planning methods on trees and graphs are known to be NP-hard [22] since the search space grows exponentially as the number of agents rise [13]. Nevertheless, [14] used [13] to generate expert data for training a GNN model that can scale up to more vehicles than trained on. [23] uses RL and proposes a dense reward function to encourage environmental exploration. However, the need for exploration tends to make the learning sample-inefficient [24, 25] particularly when compared with imitation learning approaches [26]. [27] rather combines RL for single agent path planning with imitation learning to learn actions that can influence other agents. All approaches described above work either on a discrete grid, discrete action space, assume holomonic robots or their combination.

CL-MAPF [9] uses a Body Conflict Tree to describe agent collision scenarios as spatiotemporal constraints. It then applies a Hybrid-State A* Search algorithm to generate paths satisfying both kinematic and spatiotemporal constraints of the vehicles. However, under challenging test scenarios with vehicles crowding together, the algorithm takes long to search for a solution and can easily time out. To find feasible solution for large-scale multi-vehicle trajectory planning, CSDO [10] first searches for a coarse initial guess using a large search step. Then, the Decentralized Quadratic Programming is implemented to refine this guess for minor collisions. GCBF+ [8] based on GCBF [7] aims to provide safety guarantees utilizing control barrier functions (CBF). A Graphical Neural Network is trained to learn agent control policy.

III Framework

III-A Problem Description

We aim to solve the task of multi-agent navigation in unconstrained environments. Given Nvโ€‹eโ€‹hN_{veh} dynamic vehicles and Noโ€‹bโ€‹sN_{obs} static obstacles in the scene, the task is to ensure each vehicle reaches its desired destination while avoiding collision with other agents. The state vector for ego-vehicle ii (1โ‰คiโ‰คNvโ€‹eโ€‹h1\leq i\leq N_{veh}) at current time tt can be represented as ๐ฌt(i)=[xt,yt,ฮธt,vt,xtโ€‹aโ€‹r,ytโ€‹aโ€‹r,ฮธtโ€‹aโ€‹r]T\mathbf{s}_{t}^{({i})}=[x_{t},y_{t},\theta_{t},v_{t},x_{tar},y_{tar},\theta_{tar}]^{T}, where xtx_{t} and yty_{t} shows the position, ฮธt\theta_{t}, the orientation and vtv_{t} the speed at current time tt. Meanwhile, xtโ€‹aโ€‹rx_{tar} and ytโ€‹aโ€‹ry_{tar} are co-ordinates of the target position, and ฮธtโ€‹aโ€‹r\theta_{tar} is the target orientation. Each ego-vehicle is controlled by a control vector ๐œt(ivโ€‹eโ€‹h)=[pt,ฯ†t]T\mathbf{c}_{t}^{({i}_{veh})}=[p_{t},\varphi_{t}]^{T}, where ptโˆˆ[โˆ’P,P]p_{t}\in[-P,P] and ฯ†tโˆˆ[โˆ’ฮฆ,ฮฆ]\varphi_{t}\in[-\Phi,\Phi] are the pedal acceleration and steering angle, limited in magnitude by PP and ฮฆ\Phi respectively. The obstacle kk (1โ‰คkโ‰คNoโ€‹bโ€‹s1\leq k\leq N_{obs}) is represented by a state vector ๐ฌk=[x,y,r]T\mathbf{s}^{k}=[x,y,r]^{T}, where xx and yy is the position, and rr is the radius of the circle circumscribing all vertices of the obstacle. The kinematics of the vehicles is modeled using the bicycle model [28].

xt+1=xt+vtโ‹…cosโก(ฮธt)โ‹…ฮ”โ€‹tyt+1=yt+vtโ‹…sinโก(ฮธt)โ‹…ฮ”โ€‹tฮธt+1=ฮธt+vtโ‹…tanโก(ฯ†t)โ‹…ฮณโ‹…ฮ”โ€‹tvt+1=ฮฒโ‹…vt+ptโ‹…ฮ”โ€‹t\begin{split}x_{t+1}&=x_{t}+v_{t}\cdot\cos(\theta_{t})\cdot\Delta t\\ y_{t+1}&=y_{t}+v_{t}\cdot\sin(\theta_{t})\cdot\Delta t\\ \theta_{t+1}&=\theta_{t}+v_{t}\cdot\tan(\varphi_{t})\cdot\gamma\cdot\Delta t\\ v_{t+1}&=\beta\cdot v_{t}+p_{t}\cdot\Delta t\end{split} (1)

It describes how the equations of motion can be updated in time increments of ฮ”โ€‹t\Delta t assuming no slip condition, valid under low or moderate vehicle speed vv when making turns. Note that ฮฒ\beta and ฮณ\gamma are the tuneable hyper-parameters modeling the environment friction and vehicle wheelbase respectively.

We create a velocity vector field for each vehicle, which in turn can dynamically generate reference control variables. Generation of the velocity field can be divided into two stages:

  • โ€ข

    Estimation of reference orientation of ego-vehicle for every position in the map (See Subsection III-B).

  • โ€ข

    Estimation of reference speed for the corresponding positions. (See Subsection III-C).

The orientation which a vehicle should possess at any position on the map is referred to as the Reference Orientation and is represented as a unit vector. The black arrows in Fig. 1 show the reference orientation for the black ego vehicle at different positions in the map. It can be seen that the arrows attract the ego-vehicle towards its target while repelling it away from the other vehicles and obstacles. Note that the current orientation of the ego-vehicle is not aligned with the reference orientation at its existing position. Therefore, we ought to find the control variables which align the ego-vehicle with the reference orientation. Apart from the reference orientation at each position in the map, the ego-vehicle also has a reference speed which should be attained by the control variables. Lastly, note that apart from the black vehicle, a separate reference orientation map is also created for the blue and red vehicles present in Fig. 1 (not shown in the Figure).

Hence, our task of multi-agent navigation simplifies to finding the reference orientation maps and corresponding reference speed for each vehicle independently. In the subsequent sub-section, we describe how the reference orientation and speed are estimated.

III-B Estimation of Reference Orientation

We first define some frequently used functions and vectors before we begin discussion of reference orientation estimation. The vector function ๐Ÿ๐ฎ๐ง๐ขโ€‹(๐š)\mathbf{f_{uni}}(\mathbf{a}) is the function which takes a non-zero vector ๐š\mathbf{a} as input and divides by its magnitude to convert it into a unit vector.

Other scalar functions include fsโ€‹gโ€‹nโ€‹(a)f_{sgn}(a) and fpโ€‹oโ€‹sโ€‹(a)f_{pos}(a) which both output 1 if the scalar input aa is positive. However, fsโ€‹gโ€‹nโ€‹(a)f_{sgn}(a) outputs -1, while fpโ€‹oโ€‹sโ€‹(a)f_{pos}(a) outputs 0 when aa is negative. We now define the vector from the next position (xt+1(i),yt+1(i))(x^{(i)}_{t+1},y^{(i)}_{t+1}) of ego vehicle ii to

  • โ€ข

    its target position (xtโ€‹aโ€‹r(i),ytโ€‹aโ€‹r(i))(x^{(i)}_{tar},y^{(i)}_{tar}) as ๐—tโ€‹aโ€‹r(i)=[xtโ€‹aโ€‹r(i)โˆ’xt+1(i),ytโ€‹aโ€‹r(i)โˆ’yt+1(i)]T\mathbf{X}^{(i)}_{tar}=[x^{(i)}_{tar}-x^{(i)}_{t+1},y^{(i)}_{tar}-y^{(i)}_{t+1}]^{T}

  • โ€ข

    the position (xoโ€‹bโ€‹s(k),yoโ€‹bโ€‹s(k))(x^{(k)}_{obs},y^{(k)}_{obs}) of the static obstacle kk as ๐—oโ€‹bโ€‹sk(i)=[xoโ€‹bโ€‹s(k)โˆ’xt+1(i),yoโ€‹bโ€‹s(k)โˆ’yt+1(i)]T\mathbf{X}^{(i)}_{obs_{k}}=[x^{(k)}_{obs}-x^{(i)}_{t+1},y^{(k)}_{obs}-y^{(i)}_{t+1}]^{T}

  • โ€ข

    next position (xt+1(j),yt+1(j))(x^{(j)}_{t+1},y^{(j)}_{t+1}) of another vehicle jj as ๐—vโ€‹eโ€‹hj(i)=[xt+1(j)โˆ’xt+1(i),yt+1(j)โˆ’yt+1(i)]T\mathbf{X}^{(i)}_{veh_{j}}=[x^{(j)}_{t+1}-x^{(i)}_{t+1},y^{(j)}_{t+1}-y^{(i)}_{t+1}]^{T}

The unit vector along the Z-axis is ๐™=[0,0,1]T\mathbf{Z}=[0,0,1]^{T}, while the unit orientation vector of the ego vehicle at the current target states are given by ๐”t(i)=[cosโก(ฮธt(i)),sinโก(ฮธt(i))]T\mathbf{U}^{(i)}_{t}=[\cos(\theta^{(i)}_{t}),\sin(\theta^{(i)}_{t})]^{T} and ๐”tโ€‹aโ€‹r(i)=[cosโก(ฮธtโ€‹aโ€‹r(i)),sinโก(ฮธtโ€‹aโ€‹r(i))]T\mathbf{U}^{(i)}_{tar}=[\cos(\theta^{(i)}_{tar}),\sin(\theta^{(i)}_{tar})]^{T} respectively.

For the ego vehicle ii, the target reaching component of the reference orientation ๐ฎtโ€‹aโ€‹r(i)\mathbf{u}^{(i)}_{tar} is defined as:

๐ฎtโ€‹aโ€‹r(i)={๐Ÿ๐ฎ๐ง๐ขโ€‹(๐—tโ€‹aโ€‹r(i))โ‹…ฮพtโ€‹aโ€‹r(i)โ€–๐—tโ€‹aโ€‹r(i)โ€–2>rp๐Ÿ๐ฎ๐ง๐ขโ€‹(๐”tโ€‹aโ€‹r(i)+ฮปtโ€‹aโ€‹r(i)โ‹…๐Ÿ๐ฎ๐ง๐ขโ€‹(๐—tโ€‹aโ€‹r(i)))otherwiseฮปtโ€‹aโ€‹r(i)=(โ€–๐—tโ€‹aโ€‹r(i)โ€–2rp+fpโ€‹oโ€‹sโ€‹(โ€–๐—tโ€‹aโ€‹r(i)โ€–2โˆ’ฯตp))โ‹…fsโ€‹gโ€‹nโ€‹(๐—tโ€‹aโ€‹r(i)Tโ‹…๐”tโ€‹aโ€‹r(i))ฮพtโ€‹aโ€‹r(i)={1โ€–๐—tโ€‹aโ€‹r(i)โ€–2โ‰ฅ0.5โ‹…vd2+rpfsโ€‹gโ€‹nโ€‹(๐—tโ€‹aโ€‹r(i)Tโ‹…๐”t(i))otherwise\begin{split}\mathbf{u}^{(i)}_{tar}&=\begin{cases}\mathbf{f_{uni}}(\mathbf{X}^{(i)}_{tar})\cdot\xi_{tar}^{(i)}&\|\mathbf{X}^{(i)}_{tar}\|_{2}>r_{p}\\ \mathbf{f_{uni}}(\mathbf{U}^{(i)}_{tar}+\lambda^{(i)}_{tar}\cdot\mathbf{f_{uni}}(\mathbf{X}^{(i)}_{tar}))&\text{otherwise}\end{cases}\\ \lambda^{(i)}_{tar}&=(\frac{\|\mathbf{X}^{(i)}_{tar}\|_{2}}{r_{p}}+f_{pos}(\|\mathbf{X}^{(i)}_{tar}\|_{2}-\epsilon_{p}))\cdot f_{sgn}({\mathbf{X}_{tar}^{(i)}}^{T}\cdot\mathbf{U}^{(i)}_{tar})\\ \xi_{tar}^{(i)}&=\begin{cases}1&\|\mathbf{X}^{(i)}_{tar}\|_{2}\geq 0.5\cdot v^{2}_{d}+r_{p}\\ f_{sgn}({\mathbf{X}_{tar}^{(i)}}^{T}\cdot\mathbf{U}^{(i)}_{t})&\text{otherwise}\\ \end{cases}\\ \end{split} (2)

where rpr_{p} is the parking threshold (black dotted circle in Fig. 1) and 0.5โ‹…vd20.5\cdot v^{2}_{d} is the marginal parking threshold (shaded blue region in Fig. 1). vdv_{d} is the default reference speed, estimation of which is explained in Subsection III-C. ฯตp\epsilon_{p} is the threshold above which an additional term is introduced when the vehicle is within the parking threshold.

Equation 2 shows that when the ego-vehicle ii is far away from the parking threshold, the reference orientation is in the direction of ๐—tโ€‹aโ€‹r(i)\mathbf{X}^{(i)}_{tar}. However, as the ego-vehicle approaches the target and the velocity is high, then the vehicle might overshoot the target and enter the shaded marginal parking threshold region. In this case the direction of the orientation is flipped using fsโ€‹gโ€‹n(๐—tโ€‹aโ€‹r(i)T.๐”t(i))f_{sgn}({\mathbf{X}_{tar}^{(i)}}^{T}.\mathbf{U}^{(i)}_{t}). This is to prevent the vehicle from moving in circles. A detailed explanation of this is given in the supplementary file on the project page: https://yininghase.github.io/MA-DV2F/supplementary.pdf.

Equation 2 also shows for the condition when the distance โ€–๐—tโ€‹aโ€‹r(i)โ€–2\|\mathbf{X}^{(i)}_{tar}\|_{2} falls below the parking threshold rpr_{p}, and is closer to the target. The ego vehicle should be guided to not only reach the position of the target (๐—tโ€‹aโ€‹r(i)=0\mathbf{X}^{(i)}_{tar}=0) but also be aligned with the target orientation (๐”tโ€‹aโ€‹r(i)=๐”t(i)\mathbf{U}^{(i)}_{tar}=\mathbf{U}^{(i)}_{t}). The balancing act between these two conditions is handled by the ฮปtโ€‹aโ€‹r(i)\lambda^{(i)}_{tar} term in Equation 2. Like previously, fsโ€‹gโ€‹nโ€‹(๐—tโ€‹aโ€‹r(i)Tโ‹…๐”tโ€‹aโ€‹r(i))f_{sgn}({\mathbf{X}_{tar}^{(i)}}^{T}\cdot\mathbf{U}^{(i)}_{tar}) in ฮปtโ€‹aโ€‹r(i)\lambda^{(i)}_{tar} makes sure the reference and target orientations are in the same directions when parking. The fpโ€‹oโ€‹sโ€‹(โ€–๐—tโ€‹aโ€‹r(i)โ€–2โˆ’ฯตp)f_{pos}(\|\mathbf{X}^{(i)}_{tar}\|_{2}-\epsilon_{p}) term in ฮปtโ€‹aโ€‹r(i)\lambda^{(i)}_{tar} is designed to expedite the parking behavior when the vehicle position is exactly on the left or the right side of the target and the vehicle orientation is parallel to the target orientation.

Besides reaching the target, the ego vehicle ii should also avoid collision on its way to the target. This is realized by the collision avoiding component ๐ฎcโ€‹oโ€‹lโ€‹l(i)\mathbf{u}^{(i)}_{coll} which comprises of collision avoidance between the ego vehicle ii and either static obstacle kk (๐ฎoโ€‹bโ€‹sk(i)\mathbf{u}^{(i)}_{obs_{k}}) or with another vehicle jj (๐ฎvโ€‹eโ€‹hj(i)\mathbf{u}^{(i)}_{veh_{j}}).
The equation for determining (๐ฎoโ€‹bโ€‹sk(i)\mathbf{u}^{(i)}_{obs_{k}}) is given by:

๐ฎoโ€‹bโ€‹sk(i)={๐Ÿ๐ฎ๐ง๐ขโ€‹(๐—oโ€‹bโ€‹sk(i))โ‹…ฮฑoโ€‹bโ€‹sk(i)+๐‘oโ€‹bโ€‹sk(i)โ‹…ฮฒoโ€‹bโ€‹sk(i)ฮฑoโ€‹bโ€‹sk(i)โ‰ค0๐ŸŽotherwiseฮฑoโ€‹bโ€‹sk(i)=โ€–๐—oโ€‹bโ€‹sk(i)โ€–2โˆ’roโ€‹bโ€‹s(k)โˆ’rvโ€‹eโ€‹hโˆ’(rc+|vt(i)|)ฮฒoโ€‹bโ€‹sk(i)=fpโ€‹oโ€‹sโ€‹(๐—tโ€‹aโ€‹r(i)Tโ‹…๐—oโ€‹bโ€‹sk(i))โ‹…(โ€–๐—oโ€‹bโ€‹sk(i)โ€–2โˆ’roโ€‹bโ€‹s(k))๐‘oโ€‹bโ€‹sk(i)=๐Ÿ๐ฎ๐ง๐ขโ€‹(๐™ร—๐—oโ€‹bโ€‹sk(i))\begin{split}\mathbf{u}^{(i)}_{obs_{k}}&=\begin{cases}\mathbf{f_{uni}}(\mathbf{X}^{(i)}_{obs_{k}})\cdot\alpha^{(i)}_{obs_{k}}+\mathbf{R}^{(i)}_{obs_{k}}\cdot\beta^{(i)}_{obs_{k}}&\alpha^{(i)}_{obs_{k}}\leq 0\\ \mathbf{0}&\text{otherwise}\\ \end{cases}\\ \alpha^{(i)}_{obs_{k}}&=\|\mathbf{X}^{(i)}_{obs_{k}}\|_{2}-r_{obs}^{(k)}-r_{veh}-(r_{c}+|v^{(i)}_{t}|)\\ \beta^{(i)}_{obs_{k}}&=f_{pos}({\mathbf{X}^{(i)}_{tar}}^{T}\cdot\mathbf{X}_{obs_{k}}^{(i)})\cdot(\|\mathbf{X}^{(i)}_{obs_{k}}\|_{2}-r_{obs}^{(k)})\\ \mathbf{R}^{(i)}_{obs_{k}}&=\mathbf{f_{uni}}(\mathbf{Z}\times\mathbf{X}^{(i)}_{obs_{k}})\\ \end{split} (3)

where roโ€‹bโ€‹s(k)r^{(k)}_{obs} is the radius of the static object kk, rvโ€‹eโ€‹hr_{veh} is the radius of the smallest circle enclosing the ego-vehicles, rcr_{c} is the static component, while |vt(i)||v^{(i)}_{t}| is the speed-based dynamic safety margin for collision avoidance between ego vehicle ii and obstacle kk. Higher the vehicle speed |vt(i)||v^{(i)}_{t}|, larger is this margin rc+|vt(i)|r_{c}+|v^{(i)}_{t}|. When the distance between the ego vehicle ii and static object kk (โ€–๐—oโ€‹bโ€‹sk(i)โ€–2โˆ’roโ€‹bโ€‹s(k)โˆ’rvโ€‹eโ€‹h\|\mathbf{X}^{(i)}_{obs_{k}}\|_{2}-r_{obs}^{(k)}-r_{veh}) is below the collision avoidance margin rc+|vt(i)|r_{c}+|v^{(i)}_{t}|, then ฮฑoโ€‹bโ€‹sk(i)โ‰ค0\alpha^{(i)}_{obs_{k}}\leq 0, and the reference orientation is modified to prevent collision with the static obstacle. Under this condition, the first term ๐Ÿ๐ฎ๐ง๐ขโ€‹(๐—oโ€‹bโ€‹sk(i))โ‹…ฮฑoโ€‹bโ€‹sk(i)\mathbf{f_{uni}}(\mathbf{X}^{(i)}_{obs_{k}})\cdot\alpha^{(i)}_{obs_{k}} will guide the vehicle to drive away from the static obstacle. However, driving away is not enough as this might cause a bottleneck in cases where the obstacle is symmetrically collinear between the ego-vehicle and its target. We would additionally like the ego-vehicle to drive around the obstacle to reach the target for which the term ๐‘oโ€‹bโ€‹sk(i)โ‹…ฮฒoโ€‹bโ€‹sk(i)\mathbf{R}^{(i)}_{obs_{k}}\cdot\beta^{(i)}_{obs_{k}} is relevant. If the ego-vehicle is between the obstacle and target then ฮฒoโ€‹bโ€‹sk(i)=0\beta^{(i)}_{obs_{k}}=0 (since fpโ€‹oโ€‹sโ€‹(๐—tโ€‹aโ€‹r(i)Tโ‹…๐—oโ€‹bโ€‹sk(i))=0f_{pos}({\mathbf{X}^{(i)}_{tar}}^{T}\cdot\mathbf{X}_{obs_{k}}^{(i)})=0) and hence there is no need for the vehicle to drive around the obstacle. However, if that is not the case, then an additional component is added whose direction is given by ๐‘oโ€‹bโ€‹sk(i)\mathbf{R}^{(i)}_{obs_{k}} and magnitude (ฮฒoโ€‹bโ€‹sk(i)\beta^{(i)}_{obs_{k}}) is proportional to how far the vehicle is away from the obstacle. ๐‘oโ€‹bโ€‹sk(i)\mathbf{R}^{(i)}_{obs_{k}} is given as the cross product between ๐™\mathbf{Z} and ๐—oโ€‹bโ€‹sk(i)\mathbf{X}^{(i)}_{obs_{k}} with a zero being appended to the third dimension of ๐—oโ€‹bโ€‹sk(i)\mathbf{X}^{(i)}_{obs_{k}} which originally lies in the 2D space. The vector resulting from the cross-product is perpendicular to ๐—oโ€‹bโ€‹sk(i)\mathbf{X}_{obs_{k}}^{(i)} which causes the vehicles to move in the clockwise direction around the obstacle as shown in Fig. 2.

Refer to caption
Figure 2: Shows the constituents of the reference orientation vector ๐ฎrโ€‹eโ€‹f\mathbf{u}_{ref} near an obstacle. It comprises of a target approaching (๐ฎtโ€‹aโ€‹r\mathbf{u}_{tar}) and the collision avoiding (๐ฎoโ€‹bโ€‹sk\mathbf{u}_{obs_{k}}) components. ๐ฎoโ€‹bโ€‹sk\mathbf{u}_{obs_{k}} includes ๐ฏn=๐Ÿ๐ฎ๐ง๐ขโ€‹(๐—oโ€‹bโ€‹sk(i))โ‹…ฮฑoโ€‹bโ€‹sk(i)\mathbf{v}_{n}=\mathbf{f_{uni}}(\mathbf{X}^{(i)}_{obs_{k}})\cdot\alpha^{(i)}_{obs_{k}} guiding the vehicle to drive away from the obstacle and ๐ฏt=๐‘oโ€‹bโ€‹sk(i)โ‹…ฮฒoโ€‹bโ€‹sk(i)\mathbf{v}_{t}=\mathbf{R}^{(i)}_{obs_{k}}\cdot\beta^{(i)}_{obs_{k}} leading the vehicle to go around the obstacle. When the ego-vehicle is between the obstacle and target, it is not necessary for it to go around the obstacle and thus ๐ฏt=๐ŸŽ\mathbf{v}_{t}=\mathbf{0}. The formulation to calculate ๐ฎoโ€‹bโ€‹sk\mathbf{u}_{obs_{k}} is described in Equation 3.

Likewise, the component for avoiding collision between the ego vehicle ii and another vehicle jj (๐ฎvโ€‹eโ€‹hj(i)\mathbf{u}^{(i)}_{veh_{j}}) is similar to Equation 3 except that the static obstacle radius roโ€‹bโ€‹s(k)r_{obs}^{(k)} will be replaced by the other vehicleโ€™s radius rvโ€‹eโ€‹hr_{veh} in the ฮฑvโ€‹eโ€‹hj(i)\alpha^{(i)}_{veh_{j}} term. Secondly, the speed based dynamic margin is |vt(j)|+|vt(i)||v^{(j)}_{t}|+|v^{(i)}_{t}| rather than just |vt(i)||v^{(i)}_{t}|.

The overall collision avoiding component ๐ฎcโ€‹oโ€‹lโ€‹l(i)\mathbf{u}^{(i)}_{coll} is given by:

๐ฎcโ€‹oโ€‹lโ€‹l(i)=โˆ‘k=1Noโ€‹bโ€‹s๐ฎoโ€‹bโ€‹sk(i)+โˆ‘j=1,jโ‰ iNvโ€‹eโ€‹h๐ฎvโ€‹eโ€‹hj(i)\mathbf{u}^{(i)}_{coll}=\sum_{k=1}^{N_{obs}}\mathbf{u}^{(i)}_{obs_{k}}+\sum_{j=1,j\neq i}^{N_{veh}}\mathbf{u}^{(i)}_{veh_{j}} (4)

Finally, the ideal reference orientation vector ๐ฎ^t+1(i)\mathbf{\hat{u}}^{(i)}_{t+1} (๐ฎ๐ซ๐ž๐Ÿ\mathbf{u_{ref}} in Fig. 1) is:

๐ฎ^t+1(i)=๐Ÿ๐ฎ๐ง๐ขโ€‹(๐ฎtโ€‹aโ€‹r(i)+๐ฎcโ€‹oโ€‹lโ€‹l(i))\mathbf{\hat{u}}^{(i)}_{t+1}=\mathbf{f_{uni}}(\mathbf{u}^{(i)}_{tar}+\mathbf{u}^{(i)}_{coll}) (5)

From this, the corresponding ideal reference orientation angle ฮธ^t+1(i)\hat{\theta}^{(i)}_{t+1} for ego vehicle ii can be calculated by applying aโ€‹rโ€‹cโ€‹tโ€‹aโ€‹nโ€‹2arctan2 to ๐ฎ^t+1(i)\mathbf{\hat{u}}^{(i)}_{t+1}. However, kinematic constraints arising from the motion model (Equation 1) may prevent the vehicle from immediately attaining the ideal reference orientation ฮธ^t+1(i)\hat{\theta}^{(i)}_{t+1} in the next time step. Therefore, we instead use ฮธt+1(i)\theta^{(i)}_{t+1} referred to as the real orientation. It is the reachable orientation closest to ฮธ^t+1(i)\hat{\theta}^{(i)}_{t+1}. The unit vector corresponding to this real reference orientation angle ฮธt+1(i)\theta^{(i)}_{t+1} for ego vehicle ii is ๐ฎt+1(i)\mathbf{u}^{(i)}_{t+1} (๐ฎ๐ซ๐ž๐š๐ฅ\mathbf{u_{real}} in Fig. 1) = [cosโก(ฮธt+1(i)),sinโก(ฮธt+1(i))]T[\cos(\theta^{(i)}_{t+1}),\sin(\theta^{(i)}_{t+1})]^{T}.

III-C Estimation of Reference Speed

The reference speed vt+1(i)v^{(i)}_{t+1} is chosen after determination of the reference orientation, which depends on the situation the vehicle is in. Fig. 3(b) shows a scenario wherein a vehicle is at the same location next to an obstacle but with opposite orientations. This determines if the velocity should move the car forward or backward.

Refer to caption
(a) Forbidden Forward
Refer to caption
(b) Forbidden Backward
Figure 3: Shows two different scenarios when the black ego vehicle is in a collision avoiding region. In Fig. 3(a)

, the ego vehicle is facing the obstacle, i.e. ๐ฎTโ‹…๐—oโ€‹bโ€‹s>0\mathbf{u}^{T}\cdot\mathbf{X}_{obs}>0, in which case the vehicle should be forbidden to move forwards. In Fig. 3(b), the ego vehicle is oriented away from the obstacle, i.e. ๐ฎTโ‹…๐—oโ€‹bโ€‹s<0\mathbf{u}^{T}\cdot\mathbf{X}_{obs}<0, in which case the vehicle should be prevented from moving backward.

We use the logical or (โˆจ\vee) and logical and (โˆง\wedge) operators [29] to describe the criteria for collision avoidance between ego vehicle ii and static obstacle kk under such situations as:

Foโ€‹bโ€‹sk(i)=(ฮฑoโ€‹bโ€‹sk(i)+ฯตcโ‰ค0)โˆง(ฮณoโ€‹bโ€‹sk(i)>0)Boโ€‹bโ€‹sk(i)=(ฮฑoโ€‹bโ€‹sk(i)+ฯตcโ‰ค0)โˆง(ฮณoโ€‹bโ€‹sk(i)<0)ฮณoโ€‹bโ€‹sk(i)=๐ฎt+1(i)Tโ‹…๐—oโ€‹bโ€‹sk(i)\begin{split}F^{(i)}_{obs_{k}}&=(\alpha_{obs_{k}}^{(i)}+\epsilon_{c}\leq 0)\wedge(\gamma_{obs_{k}}^{(i)}>0)\\ B^{(i)}_{obs_{k}}&=(\alpha_{obs_{k}}^{(i)}+\epsilon_{c}\leq 0)\wedge(\gamma_{obs_{k}}^{(i)}<0)\\ \gamma^{(i)}_{obs_{k}}&={\mathbf{u}^{(i)}_{t+1}}^{T}\cdot\mathbf{X}^{(i)}_{obs_{k}}\\ \end{split} (6)

where ฮฑoโ€‹bโ€‹sk(i)\alpha_{obs_{k}}^{(i)} is same as defined in Equation 3, ฯตc\epsilon_{c} is the an additional tolerance for the collision checking region. Foโ€‹bโ€‹sk(i)F^{(i)}_{obs_{k}} equalling tโ€‹rโ€‹uโ€‹etrue indicates the ego vehicle ii is forbidden to drive forwards towards obstacle kk. This happens when the angle between the reference orientation (๐ฎ๐ญ+๐Ÿ(๐ข)\mathbf{{u}^{(i)}_{t+1}}) and the vector from the ego vehicle to the obstacle (๐—oโ€‹bโ€‹sk(i)\mathbf{X}^{(i)}_{obs_{k}}) is less than 90โˆ˜90^{\circ} i.e. ฮณoโ€‹bโ€‹sk(i)=๐ฎt+1(i)Tโ‹…๐—oโ€‹bโ€‹sk(i)>0\gamma^{(i)}_{obs_{k}}={\mathbf{u}^{(i)}_{t+1}}^{T}\cdot\mathbf{X}^{(i)}_{obs_{k}}>0. Likewise, Boโ€‹bโ€‹sk(i)B^{(i)}_{obs_{k}} equaling tโ€‹rโ€‹uโ€‹etrue happens when ฮณoโ€‹bโ€‹sk(i)=๐ฎt+1(i)Tโ‹…๐—oโ€‹bโ€‹sk(i)<0\gamma^{(i)}_{obs_{k}}={\mathbf{u}^{(i)}_{t+1}}^{T}\cdot\mathbf{X}^{(i)}_{obs_{k}}<0 indicating that the ego vehicle ii is forbidden to drive backwards.

Similarly for the case of ego vehicle ii and another vehicle jj, the conditions for preventing the ego-vehicle from moving forward Fvโ€‹eโ€‹hj(i)F^{(i)}_{veh_{j}} or backward Bvโ€‹eโ€‹hj(i)B^{(i)}_{veh_{j}} are the same as described in Equation 6 except that ๐—vโ€‹eโ€‹hj(i)\mathbf{X}^{(i)}_{veh_{j}} replaces ๐—oโ€‹bโ€‹sk(i)\mathbf{X}^{(i)}_{obs_{k}}.

Summarising the results above, the magnitude and sign of the velocity depends on the combination of these boolean variables i.e. Foโ€‹bโ€‹sk(i)F^{(i)}_{obs_{k}}, Fvโ€‹eโ€‹hj(i),Boโ€‹bโ€‹sk(i)F^{(i)}_{veh_{j}},B^{(i)}_{obs_{k}} and Bvโ€‹eโ€‹hj(i)B^{(i)}_{veh_{j}} as follows:

Fcโ€‹oโ€‹lโ€‹l(i)=(โ‹k=1Noโ€‹bโ€‹sFoโ€‹bโ€‹sk(i))โˆจ(โ‹j=1,jโ‰ iNvโ€‹eโ€‹hFvโ€‹eโ€‹hj(i))Bcโ€‹oโ€‹lโ€‹l(i)=(โ‹k=1Noโ€‹bโ€‹sBoโ€‹bโ€‹sk(i))โˆจ(โ‹j=1,jโ‰ iNvโ€‹eโ€‹hBvโ€‹eโ€‹hj(i))vcโ€‹oโ€‹lโ€‹l(i)={vdBcโ€‹oโ€‹lโ€‹l(i)โˆจ(ยฌFcโ€‹oโ€‹lโ€‹l(i))โˆ’vd(ยฌBcโ€‹oโ€‹lโ€‹l(i))โˆจFcโ€‹oโ€‹lโ€‹l(i)0Bcโ€‹oโ€‹lโ€‹l(i)โˆจFcโ€‹oโ€‹lโ€‹l(i)vtโ€‹aโ€‹r(i)otherwise\begin{split}F^{(i)}_{coll}&=(\bigvee_{k=1}^{N_{obs}}F^{(i)}_{obs_{k}})\vee(\bigvee_{j=1,j\neq i}^{N_{veh}}F^{(i)}_{veh_{j}})\\ B^{(i)}_{coll}&=(\bigvee_{k=1}^{N_{obs}}B^{(i)}_{obs_{k}})\vee(\bigvee_{j=1,j\neq i}^{N_{veh}}B^{(i)}_{veh_{j}})\\ v^{(i)}_{coll}&=\begin{cases}v_{d}&B^{(i)}_{coll}\vee(\neg F^{(i)}_{coll})\\ -v_{d}&(\neg B^{(i)}_{coll})\vee F^{(i)}_{coll}\\ 0&B^{(i)}_{coll}\vee F^{(i)}_{coll}\\ v_{tar}^{(i)}&\text{otherwise}\\ \end{cases}\end{split} (7)

The first 3 conditions check whether or not there is a potential for collision. The velocity is +vd+v_{d}, when the ego-vehicle is prevented from moving backwards (Bcโ€‹oโ€‹lโ€‹l(i)B^{(i)}_{coll}) but allowed to move forward (ยฌFcโ€‹oโ€‹lโ€‹l(i)\neg F^{(i)}_{coll}) and โˆ’vd-v_{d} when vice versa. The reference velocity is zero when the ego-vehicle is prevented from moving either forward (Fcโ€‹oโ€‹lโ€‹l(i)F^{(i)}_{coll}) or backward (Bcโ€‹oโ€‹lโ€‹l(i)B^{(i)}_{coll}). In all other cases, the velocity is vtโ€‹aโ€‹r(i)v_{tar}^{(i)} defined as:

vtโ€‹aโ€‹r(i)={ฮพp(i)โ‹…ฮปp(i)โ‹…vdโ€–๐—tโ€‹aโ€‹r(i)โ€–2โ‰คrpvdโ‹…fsโ€‹gโ€‹nโ€‹(๐ฎt+1(i)Tโ‹…๐ฎ^t+1(i))otherwiseฮปp(i)={ฮปยฏp(i)(โ€–๐—tโ€‹aโ€‹r(i)โ€–2<ฯตp)โˆง(|ฮธtโ€‹aโ€‹r(i)โˆ’ฮธt+1(i)|<ฯตo)ฮปยฏp(i)otherwiseฮปยฏp(i)=mโ€‹iโ€‹nโ€‹iโ€‹mโ€‹uโ€‹mโ€‹(โ€–๐—tโ€‹aโ€‹r(i)โ€–2rp+|ฮธtโ€‹aโ€‹r(i)โˆ’ฮธt+1(i)|vd,1)ฮพp(i)={1๐ฎt+1(i)Tโ‹…๐—tโ€‹aโ€‹r(i)>0.25โˆ’1๐ฎt+1(i)Tโ‹…๐—tโ€‹aโ€‹r(i)<โˆ’0.25fsโ€‹gโ€‹nโ€‹(๐ฎt(i))otherwise\begin{split}v^{(i)}_{tar}&=\begin{cases}\xi^{(i)}_{p}\cdot\lambda^{(i)}_{p}\cdot v_{d}&\|\mathbf{X}^{(i)}_{tar}\|_{2}\leq r_{p}\\ v_{d}\cdot f_{sgn}({\mathbf{u}^{(i)}_{t+1}}^{T}\cdot\mathbf{\hat{u}}^{(i)}_{t+1})&\text{otherwise}\\ \end{cases}\\ \lambda^{(i)}_{p}&=\begin{cases}\bar{\lambda}^{(i)}_{p}&(\|\mathbf{X}^{(i)}_{tar}\|_{2}<\epsilon_{p})\wedge(|\theta^{(i)}_{tar}-\theta^{(i)}_{t+1}|<\epsilon_{o})\\ \sqrt{\bar{\lambda}^{(i)}_{p}}&\text{otherwise}\\ \end{cases}\\ \bar{\lambda}^{(i)}_{p}&={minimum}(\frac{\|\mathbf{X}^{(i)}_{tar}\|_{2}}{r_{p}}+\frac{|\theta^{(i)}_{tar}-\theta^{(i)}_{t+1}|}{v_{d}},1)\\ \xi^{(i)}_{p}&=\begin{cases}1&{\mathbf{u}^{(i)}_{t+1}}^{T}\cdot\mathbf{X}^{(i)}_{tar}>0.25\\ -1&{\mathbf{u}^{(i)}_{t+1}}^{T}\cdot\mathbf{X}^{(i)}_{tar}<-0.25\\ f_{sgn}(\mathbf{u}^{(i)}_{t})&\text{otherwise}\\ \end{cases}\\ \end{split} (8)

where ฯตp\epsilon_{p} and ฯตo\epsilon_{o} are the acceptable position and orientation tolerances for deciding to stop the vehicle at the target. When the ego vehicle is within the parking radius, it reduces its speed as it gets closer to the target position and orientation. This is achieved by the multiplier ฮปp(i)\lambda^{(i)}_{p}, which is proportional to ฮปยฏp(i)\bar{\lambda}^{(i)}_{p} when the vehicle state is very close to the target state and ฮปยฏp(i)\sqrt{\bar{\lambda}^{(i)}_{p}} when farther away from the tolerance. The square root accelerates the vehicle approaching its target when there is still some distance between the current state and target state, i.e. ยฌ((โ€–๐—tโ€‹aโ€‹r(i)โ€–2<ฯตp)โˆง(|ฮธtโ€‹aโ€‹r(i)โˆ’ฮธt+1(i)|<ฯตo))\neg((\|\mathbf{X}^{(i)}_{tar}\|_{2}<\epsilon_{p})\wedge(|\theta^{(i)}_{tar}-\theta^{(i)}_{t+1}|<\epsilon_{o})). When the vehicle is already very close to the target state, i.e. (โˆฅ๐—tโ€‹aโ€‹r(i)โˆฅ2<ฯตp)โˆง(|ฮธtโ€‹aโ€‹r(i)โˆ’ฮธt+1(i)|<ฯตo\|\mathbf{X}^{(i)}_{tar}\|_{2}<\epsilon_{p})\wedge(|\theta^{(i)}_{tar}-\theta^{(i)}_{t+1}|<\epsilon_{o}), the ratio ฮปยฏp(i)\bar{\lambda}^{(i)}_{p} prevents the vehicle from shaking forward and backward. For ฮปยฏp(i)\bar{\lambda}^{(i)}_{p}, the first term: 1rpโ‹…โ€–๐—tโ€‹aโ€‹r(i)โ€–2\frac{1}{r_{p}}\cdot\|\mathbf{X}^{(i)}_{tar}\|_{2}, decreases linearly with the ego vehicle distance to its target, and the second term, i.e. 1vdโ‹…|ฮธtโ€‹aโ€‹r(i)โˆ’ฮธt+1(i)|\frac{1}{v_{d}}\cdot|\theta^{(i)}_{tar}-\theta^{(i)}_{t+1}|, reduces linearly with the angle difference between the reference ฮธt+1(i)\theta^{(i)}_{t+1} and target ฮธtโ€‹aโ€‹r(i)\theta^{(i)}_{tar} orientation angles. However, we do not allow the reference parking speed vtโ€‹aโ€‹r(i)v_{tar}^{(i)} to be any higher than the default reference speed vdv_{d}. So, ฮปยฏp(i)\bar{\lambda}^{(i)}_{p} is clipped to a maximum of 11. ฮพp(i)\xi^{(i)}_{p} controls whether the ego vehicle moves forwards or backwards by checking ๐ฎt+1(i)Tโ‹…๐—tโ€‹aโ€‹r(i){\mathbf{u}^{(i)}_{t+1}}^{T}\cdot\mathbf{X}^{(i)}_{tar}. Originally, the vehicle should move forwards when facing the target, i.e. ๐ฎt+1(i)Tโ‹…๐—tโ€‹aโ€‹r(i)>0{\mathbf{u}^{(i)}_{t+1}}^{T}\cdot\mathbf{X}^{(i)}_{tar}>0, and move backwards when backing toward the target, i.e. ๐ฎt+1(i)Tโ‹…๐—tโ€‹aโ€‹r(i)<0{\mathbf{u}^{(i)}_{t+1}}^{T}\cdot\mathbf{X}^{(i)}_{tar}<0. However, to prevent the vehicle from changing direction at high frequency within short traveling distances, we set a margin allowing the vehicle to keep its previous direction when |๐ฎt+1(i)Tโ‹…๐—tโ€‹aโ€‹r(i)|โ‰ค0.25|{\mathbf{u}^{(i)}_{t+1}}^{T}\cdot\mathbf{X}^{(i)}_{tar}|\leq 0.25.

When the ego vehicle ii is out of the parking area of radius rpr_{p}, the reference speed vt+1(i)v_{t+1}^{(i)} takes the forn vdโ‹…fsโ€‹gโ€‹nโ€‹(๐ฎt+1(i)Tโ‹…๐ฎ^t+1(i))v_{d}\cdot f_{sgn}({\mathbf{u}^{(i)}_{t+1}}^{T}\cdot\mathbf{\hat{u}}^{(i)}_{t+1}), where the reference speed vt+1(i)v_{t+1}^{(i)} takes the default value vdv_{d}, but changes to negative, i.e. โˆ’vd-v_{d}, when ๐ฎt+1(i)Tโ‹…๐ฎ^t+1(i)<0{\mathbf{u}^{(i)}_{t+1}}^{T}\cdot{\mathbf{\hat{u}}^{(i)}_{t+1}}<0.

The final ideal reference speed v^t+1(i)\hat{v}^{(i)}_{t+1} for ego vehicle ii is:

v^t+1(i)=vcโ€‹oโ€‹lโ€‹l(i)\hat{v}^{(i)}_{t+1}=v^{(i)}_{coll} (9)

Similar to the reference orientation, the ideal reference speed v^t+1(i)\hat{v}^{(i)}_{t+1} may not achievable due to the limitation of the maximum pedal command. The real reference speed vt+1(i)v^{(i)}_{t+1} is therefore the reachable speed value closest to v^t+1(i)\hat{v}^{(i)}_{t+1}.

Calculation of Reference Steering Angle and Reference Pedal: Given the reference orientation and velocity, the Vehicle Kinematic Equation 1 can be inverted to determine the reference steering angle ฯ†t(i)\varphi^{(i)}_{t} and pedal acceleration pt(i)p^{(i)}_{t} for controlling the vehicle at time tt :

ฯ†t(i)=aโ€‹rโ€‹cโ€‹tโ€‹aโ€‹nโ€‹2โ€‹((ฮธt+1(i)โˆ’ฮธt(i))vt(i)โ‹…ฮณโ‹…ฮ”โ€‹t)pt(i)=vt+1(i)โˆ’ฮฒโ‹…vt(i)ฮ”โ€‹t\begin{split}\varphi^{(i)}_{t}&=arctan2(\frac{(\theta^{(i)}_{t+1}-\theta^{(i)}_{t})}{v^{(i)}_{t}\cdot\gamma\cdot\Delta t})\\ p^{(i)}_{t}&=\frac{v^{(i)}_{t+1}-\beta\cdot v^{(i)}_{t}}{\Delta t}\end{split} (10)

These reference control commands can either be directly used to control the vehicles, or as labels for training the GNN model in a self-supervised manner using the architecture of [12].

IV EXPERIMENTS

IV-A Algorithm Evaluation

To assess the performance of our MA-DV2F approach and its learning based counterpart (self-supervised GNN model), we compare with other recent learning and search/optimization based algorithms in challenging, collision prone test cases. The recent learning based approaches include [12] which is a GNN model trained in a supervised manner and an extension of [7] i.e. GCBF+ [8], an RL-inspired GNN model incorporating safety constraints. Meanwhile, the two search/optimization based algorithms used for comparison include CL-MAPF [9] and CSDO [10]. The test dataset comprises of challenging, collision prone scenarios with the number of vehicles ranging from 10 to 50 and static obstacles chosen between 0 and 25. Note that, all the static obstacles in the test cases are fixed to be circles with fixed radius because of the limitation of the CL-MAPF and CSDO environment setting and because it circumscribes an obstacle of arbitrary shape, allowing for safer behaviour. For the GCBF+, we use the DubinsCar model because it has vehicle kinematics most similar to ours. As GCBF+ has a completely different working domain with different map and agent sizes, we scale our test cases when running GCBF+ under the rule that the ratio of the map size and the agent size remains equal across all the algorithms. Details regarding the generation of test samples are given in the supplementary file on the project page.

Table I shows the evaluation results of the different methods across the different vehicle-obstacle combinations described earlier against the success rate metric [8]. It measures the percentage of vehicles that successfully reach their targets within a tolerance without colliding with other objects along the way. The results show that our MA-DV2F outperforms other algorithms achieving almost 100% success rate across all vehicle-obstacle combinations. Even the self-supervised GNN model performs far better than other algorithms when scaling the number of vehicles. Only the CSDO algorithm has slightly better results than our GNN model when the number of agents are low (20). However, CSDOโ€™s performance drops drastically as the number of agents are increased under these challenging conditions. Note that the GCBF+ pipeline only checks whether the agents reach their target positions but ignores the target orientations as shown in the project page: https://yininghase.github.io/MA-DV2F/#MC which explains why it has such poor performance. Therefore, we additionally evaluate GCBF+ by ignoring the orientation and only considering the position. Results of which are shown in the second last column. Even then, the GCBF+, does not match the performance of MA-DV2F.

Number of Number of MA-DV2F Self-Supervised CSDO CL-MAPF GCBF+ GCBF+ Supervised
Vehicles Obstacles (Ours) Model (Ours) [10] [9] [8] (only position) Model [12]
success rate โ†‘\uparrow
10 0 1.0000 0.9929 0.9693 0.4290 0.0613 0.9021 0.7285
20 0 1.0000 0.9895 0.9400 0.4963 0.0563 0.8169 0.2041
30 0 1.0000 0.9793 0.8820 0.4977 0.0458 0.7550 0.0477
40 0 1.0000 0.9760 0.7063 0.5632 0.0451 0.7080 0.0137
50 0 1.0000 0.9686 0.6523 0.5992 0.0426 0.6618 0.0045
10 25 0.9952 0.9458 0.9682 0.5640 0.0509 0.7192 0.3734
20 25 0.9902 0.9208 0.9663 0.5997 0.0393 0.6479 0.0756
30 25 0.9844 0.8998 0.8456 0.6320 0.0385 0.5754 0.0196
40 25 0.9772 0.8723 0.6723 0.6531 0.0330 0.5211 0.0068
50 25 0.9704 0.8504 0.5897 0.6749 0.0290 0.4789 0.0028
TABLE I: Shows the Success rate metric (Higher is better) for different models, i.e. our MA-DV2F, our self-supervised GNN model, CSDO [10], CL-MAPF [9], GCBF+ [8] and the supervised GNN model [12]. As GCBF+ does not consider vehicle orientation when reaching the target, we additionally evaluate the Success rate by only checking vehicle position and disregarding the orientation. For each row, the bold number show the highest success rate among all algorithms.

IV-B Discussion

Investigating failure Causes: Note that the Success rate metric measures a modelโ€™s ability to not only reach its destination but also avoid collisions. Therefore, for challenging test cases, a navigation algorithm may fail due to two main reasons: the algorithms behave too aggressively by driving the vehicles to their targets albeit at the cost of colliding with other agents or behaves too conservatively to ensure safety of the vehicles resulting in some vehicles getting stuck mid-way without reaching their targets. Therefore to disambiguate between the two causes, we additionally evaluate all algorithms against the Reach rate and Safe rate metrics as proposed in [8]. Reach rate measures the percentage of vehicles that reach their targets within a tolerance disregarding any collisions along the way. Meanwhile, the Safe rate calculates the percentage of vehicles that do not collide with any other agents irrespective of whether or not they reach their targets. Fig. 4 shows the performance of the different methods against the Reach Rate and Safe Rate metrics. It can be seen that the supervised GNN model behaves rather aggressively reaching the target in majority of the cases albeit at the cost of multiple collisions leading to high Reach but low Safe rates. In contrast, CL-MAPF [9] takes a greedy strategy in its optimization/search pipeline. It can quickly find the path for some vehicles which are easy to plan. But then it is unable to find paths for other vehicles and gets stuck in this partial solution. This greedy strategy lead to a low reach rate, but high safe rate since no collisions happen among vehicles that do not move.

GCBF+ has a higher safe rate than reach rate. This is not because the vehicles fail to reach their target, but rather because they reach the target at an incorrect orientation. This is aligned with the intuition described in IV-A and is further corroborated by the fact that when orientation is ignored in the evaluation and only position is considered, the reach rate jumps drastically. Nevertheless, it is still much lower than both our MA-DV2F and its GNN self-supervised counterpart. They are the only 2 methods which maintain a consistent high performance against both metrics across all vehicle-obstacle combinations.

Refer to caption
(a) safe rate with no obstacles
Refer to caption
(b) safe rate with 25 obstacles
Refer to caption
(c) reach rate with no obstacles
Refer to caption
(d) reach rate with 25 obstacles
Refer to caption
(e) Legend
Figure 4: Shows the Safe and Reach rate metrics (Higer is better) for the different models, i.e. our MA-DV2F, our self-supervised GNN model, supervised GNN model [12], CSDO [10], CL-MAPF [9] and GCBF+ [8].

Preventing Bottlenecks: A common problem with other planning algorithms is that vehicles tend to crowd within a small sub-region in the map. This leads to these algorithms either unable to find an optimal path resulting in all vehicles becoming stationary at their place (low reach rate) or finding sub-optimal paths resulting in more collisions among vehicles (low safe rate). To prevent such bottlenecks in our MA-DV2F model, Equation 3 causes all the vehicles to drive in the clockwise direction when encountering other agents. This leads to a roundabout driving behavior which breaks the bottleneck and can be visualized on the project page: https://yininghase.github.io/MA-DV2F/#RE. Due to this, the vehicles are capable of eventually reaching their targets,by making a detour around the other agents, resulting in both a high Reach and Safe rate. Our trained GNN model also adapts this behaviour.

Intermediate Success Rate: One might wonder if the MA-DV2F method outperforms every other method, what is the advantage of its self-supervised GNN model counterpart? One reason is that MA-DV2F behaves conservatively in some simple test cases even though there is no collision risk nearby, causing the vehicles to take a longer time to finish the journey. This is because the speed is limited to vdv_{d}. On the other hand, the self-supervised GNN counterpart is free from this restriction. If can move faster when it is far away from its target and there is less risk of collision with other objects. The figures in project page: https://yininghase.github.io/MA-DV2F/#ISR show the difference of the success rate between the self-supervised GNN model and MA-DV2F as a function of time. At the beginning when the vehicles tend to be far away from their targets, the GNN allows high velocity for vehicles, thereby causing some vehicles to reach their target faster leading to a success rate greater than that for MA-DV2F. This can also seen in the example in project page: https://yininghase.github.io/MA-DV2F/#MC, the vehicles by GNN can drive faster towards the targets than by MA-DV2F. However, maintaining a high velocity also leads to higher risk of collision when encountering challenging situations as time progresses and the success rate of MA-DV2F will gradually catch up and eventually exceed the GNN.

Number of Number of MA-DV2F CSDO CL-MAPF
Vehicles Obstacles (Ours) [10] [9]
total runtime for 1000 test cases โ†“\downarrow
10 0 4.152 195.919 125643.851
20 0 5.848 1044.360 219251.537
30 0 7.885 6179.536 328471.949
40 0 8.492 12342.014 376417.421
50 0 10.512 18306.779 395962.103
10 25 10.322 352.096 89838.921
20 25 12.260 2213.838 163001.473
30 25 13.733 9148.955 190907.288
40 25 15.562 14350.735 224962.016
50 25 18.405 16598.068 253495.187
TABLE II: Shows the total runtime in seconds (Lower is better) for the 1000 test cases for each vehicle-obstalce combination of the different models, i.e. our MA-DV2F, CSDO [10] and CL-MAPF [9].

Runtime Analysis: We compare the runtime of MA-DV2F with concurrent search based methods (CSDO and CL-MAPF). TABLE II shows the total runtime of all the 1000 test cases for each vehicle-obstalce combinations. All methods were evaluated on a machine with an Intel Core i7-10750H CPU and GeForce RTX 2070 GPU. As can be seen, our MA-DV2F, is orders of magnitude faster than its peers, since it has the ability to run the different scenarios within a batch in parallel. Meanwhile CSDO and CL-MAPF are search/optimization-based algorithms that need to solve each test cases one-by-one. Note that CL-MAPF would continue its search until a solution is found, which is why the maximum allowed runtime needs to be clipped, otherwise the runtime would be even larger. Note that the evaluations in the experiments were done for only upto 50 vehicles, since other algorithms are either extremely slow or have drastically reduced performance when scaling. However, our method is capable of scaling to a far greater number of vehicles than just 50 as can be observed on the project page for scenarios with 100, 125, 250 vehicle-obstacle combinations: https://yininghase.github.io/MA-DV2F/#LE.

Lastly, note that the runtime analysis is not done for the learning based methods since, it is dependent on the GPU rather than the algorithm itself. Therefore, for the same model architecture, the runtime will be the same for all learning based algorithms.

Limitations: If the vehicles are densely packed or their targets are within the safety margins of one another, then due to their non-holonomic behavior there might not be enough space for them to navigate without risking a collision. In such a scenario, the vehicles will act conservatively and hesitate to proceed so as to avoid collisions leading to less vehicles reaching the target. A similar outcome is observed, if some vehicles start behaving unexpectedly, wherein the safety margin would need to be increased to mitigate the risk of collision, albeit at the expense of reaching the target. More details on sensitivity analysis of the effect of change in static component of the safety margin (rcr_{c}) and visualization of the limitations is in the supplementary file and the project page.

V Conclusion

This work introduced MA-DV2F, an efficient algorithm for multi-vehicle navigation using dynamic velocity vector fields. Experimental results showed that our approach seamlessly scales with the number of vehicles. When compared with other concurrent learning and search based methods, MA-DV2F has higher success rate, lower collision metrics and higher computational efficiency.
Acknowledgements: We thank Ang Li for the discussions and his feedback on the setup of the experiments.

References

  • [1] X.ย Cao, M.ย Li, Y.ย Tao, and P.ย Lu, โ€œHma-sar: Multi-agent search and rescue for unknown located dynamic targets in completely unknown environments,โ€ IEEE Robotics and Automation Letters, 2024.
  • [2] L.ย Bartolomei, L.ย Teixeira, and M.ย Chli, โ€œFast multi-uav decentralized exploration of forests,โ€ IEEE Robotics and Automation Letters, 2023.
  • [3] F.ย Kudo and K.ย Cai, โ€œA tsp-based online algorithm for multi-task multi-agent pickup and delivery,โ€ IEEE Robotics and Automation Letters, 2023.
  • [4] B.ย Li and H.ย Ma, โ€œDouble-deck multi-agent pickup and delivery: Multi-robot rearrangement in large-scale warehouses,โ€ IEEE Robotics and Automation Letters, 2023.
  • [5] D.ย Zhu, Q.ย Khan, and D.ย Cremers, โ€œMulti-vehicle trajectory prediction and control at intersections using state and intention information,โ€ Neurocomputing, p. 127220, 2024.
  • [6] B.ย Nebel, โ€œOn the computational complexity of multi-agent pathfinding on directed graphs,โ€ in Proceedings of the International Conference on Automated Planning and Scheduling, 2020.
  • [7] S.ย Zhang, K.ย Garg, and C.ย Fan, โ€œNeural graph control barrier functions guided distributed collision-avoidance multi-agent control,โ€ in Conference on Robot Learning.ย ย ย PMLR, 2023, pp. 2373โ€“2392.
  • [8] S.ย Zhang, O.ย So, K.ย Garg, and C.ย Fan, โ€œGcbf+: A neural graph control barrier function framework for distributed safe multi-agent control,โ€ IEEE Transactions on Robotics, pp. 1โ€“20, 2025.
  • [9] L.ย Wen, Y.ย Liu, and H.ย Li, โ€œCl-mapf: Multi-agent path finding for car-like robots with kinematic and spatiotemporal constraints,โ€ Robotics and Autonomous Systems, 2022.
  • [10] Y.ย Yang, S.ย Xu, X.ย Yan, J.ย Jiang, J.ย Wang, and H.ย Huang, โ€œCsdo: Enhancing efficiency and success in large-scale multi-vehicle trajectory planning,โ€ IEEE Robotics and Automation Letters, 2024.
  • [11] R.ย S. Sutton, โ€œReinforcement learning: An introduction,โ€ A Bradford Book, 2018.
  • [12] Y.ย Ma, Q.ย Khan, and D.ย Cremers, โ€œMulti agent navigation in unconstrained environments using a centralized attention based graphical neural network controller,โ€ in IEEE 26th International Conference on Intelligent Transportation Systems, 2023.
  • [13] G.ย Sharonย et al., โ€œConflict-based search for optimal multi-agent pathfinding,โ€ Artificial intelligence, 2015.
  • [14] Q.ย Li, F.ย Gama, A.ย Ribeiro, and A.ย Prorok, โ€œGraph neural networks for decentralized multi-robot path planning,โ€ in IEEE/RSJ international conference on intelligent robots and systems (IROS), 2020.
  • [15] Y.ย Ma, A.ย Li, Q.ย Khan, and D.ย Cremers, โ€œEnhancing the performance of multi-vehicle navigation in unstructured environments using hard sample mining,โ€ arXiv preprint arXiv:2409.05119, 2024.
  • [16] O.ย Khatib, โ€œReal-time obstacle avoidance for manipulators and mobile robots,โ€ in Proceedings. 1985 IEEE international conference on robotics and automation, vol.ย 2.ย ย ย IEEE, 1985, pp. 500โ€“505.
  • [17] M.ย Reda, A.ย Onsy, A.ย Y. Haikal, and A.ย Ghanbari, โ€œPath planning algorithms in the autonomous driving system: A comprehensive review,โ€ Robotics and Autonomous Systems, vol. 174, p. 104630, 2024.
  • [18] A.ย Loganathan and N.ย S. Ahmad, โ€œA systematic review on recent advances in autonomous mobile robot navigation,โ€ Engineering Science and Technology, an International Journal, vol.ย 40, p. 101343, 2023.
  • [19] ร.ย Madridano, A.ย Al-Kaff, D.ย Martรญn, and A.ย Deย Laย Escalera, โ€œTrajectory planning for multi-robot systems: Methods and applications,โ€ Expert Systems with Applications, vol. 173, p. 114660, 2021.
  • [20] J.ย Hou, G.ย Chen, J.ย Huang, Y.ย Qiao, L.ย Xiong, F.ย Wen, A.ย Knoll, and C.ย Jiang, โ€œLarge-scale vehicle platooning: Advances and challenges in scheduling and planning techniques,โ€ Engineering, 2023.
  • [21] I.ย Iswantoย et al., โ€œArtificial potential field algorithm implementation for quadrotor path planning,โ€ International Journal of Advanced Computer Science and Applications, 2019.
  • [22] J.ย Yu and S.ย LaValle, โ€œStructure and intractability of optimal multi-robot path planning on graphs,โ€ in Proceedings of the AAAI Conference on Artificial Intelligence, vol.ย 27, no.ย 1, 2013, pp. 1443โ€“1449.
  • [23] H.-T. Wai, Z.ย Yang, Z.ย Wang, and M.ย Hong, โ€œMulti-agent reinforcement learning via double averaging primal-dual optimization,โ€ Advances in Neural Information Processing Systems, vol.ย 31, 2018.
  • [24] D.ย Li, D.ย Zhao, Q.ย Zhang, and Y.ย Chen, โ€œReinforcement learning and deep learning based lateral control for autonomous driving [application notes],โ€ IEEE Computational Intelligence Magazine, 2019.
  • [25] N.ย Vithayathilย Varghese and Q.ย H. Mahmoud, โ€œA survey of multi-task deep reinforcement learning,โ€ Electronics, vol.ย 9, no.ย 9, 2020.
  • [26] A.ย Dosovitskiy, G.ย Ros, F.ย Codevilla, A.ย Lopez, and V.ย Koltun, โ€œCARLA: An open urban driving simulator,โ€ in Proceedings of the 1st Annual Conference on Robot Learning, 2017.
  • [27] G.ย Sartoretti, J.ย Kerr, Y.ย Shi, G.ย Wagner, T.ย K.ย S. Kumar, S.ย Koenig, and H.ย Choset, โ€œPrimal: Pathfinding via reinforcement and imitation multi-agent learning,โ€ IEEE Robotics and Automation Letters, 2019.
  • [28] D.ย Wang and F.ย Qi, โ€œTrajectory planning for a four-wheel-steering vehicle,โ€ in IEEE International Conference on Robotics and Automation, 2001.
  • [29] M.ย M. Mano, Digital logic and computer design.ย ย ย Pearson Education India, 2017.

VI Supplementary

VI-A Further Explanation of MA-DV2F

In this Subsection, we further elaborate some the design choices for Equation 2 used to determine the reference orientation. The Equation is repeated here again for clarity:

๐ฎtโ€‹aโ€‹r(i)={๐Ÿ๐ฎ๐ง๐ขโ€‹(๐—tโ€‹aโ€‹r(i))โ‹…ฮพtโ€‹aโ€‹r(i)โ€–๐—tโ€‹aโ€‹r(i)โ€–2>rp๐Ÿ๐ฎ๐ง๐ขโ€‹(๐”tโ€‹aโ€‹r(i)+ฮปtโ€‹aโ€‹r(i)โ‹…๐Ÿ๐ฎ๐ง๐ขโ€‹(๐—tโ€‹aโ€‹r(i)))otherwiseฮปtโ€‹aโ€‹r(i)=(โ€–๐—tโ€‹aโ€‹r(i)โ€–2rp+fpโ€‹oโ€‹sโ€‹(โ€–๐—tโ€‹aโ€‹r(i)โ€–2โˆ’ฯตp))โ‹…fsโ€‹gโ€‹nโ€‹(๐—tโ€‹aโ€‹r(i)Tโ‹…๐”tโ€‹aโ€‹r(i))ฮพtโ€‹aโ€‹r(i)={1โ€–๐—tโ€‹aโ€‹r(i)โ€–2โ‰ฅ0.5โ‹…vd2+rpfsโ€‹gโ€‹nโ€‹(๐—tโ€‹aโ€‹r(i)Tโ‹…๐”t(i))otherwise\begin{split}\mathbf{u}^{(i)}_{tar}&=\begin{cases}\mathbf{f_{uni}}(\mathbf{X}^{(i)}_{tar})\cdot\xi_{tar}^{(i)}&\|\mathbf{X}^{(i)}_{tar}\|_{2}>r_{p}\\ \mathbf{f_{uni}}(\mathbf{U}^{(i)}_{tar}+\lambda^{(i)}_{tar}\cdot\mathbf{f_{uni}}(\mathbf{X}^{(i)}_{tar}))&\text{otherwise}\end{cases}\\ \lambda^{(i)}_{tar}&=(\frac{\|\mathbf{X}^{(i)}_{tar}\|_{2}}{r_{p}}+f_{pos}(\|\mathbf{X}^{(i)}_{tar}\|_{2}-\epsilon_{p}))\cdot f_{sgn}({\mathbf{X}_{tar}^{(i)}}^{T}\cdot\mathbf{U}^{(i)}_{tar})\\ \xi_{tar}^{(i)}&=\begin{cases}1&\|\mathbf{X}^{(i)}_{tar}\|_{2}\geq 0.5\cdot v^{2}_{d}+r_{p}\\ f_{sgn}({\mathbf{X}_{tar}^{(i)}}^{T}\cdot\mathbf{U}^{(i)}_{t})&\text{otherwise}\\ \end{cases}\\ \end{split}
Refer to caption
(a) Keep Reference Orientation
Refer to caption
(b) Reverse Reference Orientation
Figure 5: Shows the two different circumstances of the ego-vehicle overshooting its target and entering the marginal parking region (shaded blue region). In Fig. 5(a), The reference orientation vector ๐ฎrโ€‹eโ€‹f=๐Ÿ๐ฎ๐ง๐ขโ€‹(๐—tโ€‹aโ€‹r(i))\mathbf{u}_{ref}=\mathbf{f_{uni}}(\mathbf{X}^{(i)}_{tar}) points towards the target, while the current vehicle orientation is opposite to the reference orientation. Thus, the vehicle needs to turn around again to match the reference orientation. In Fig. 5(b), the reference orientation vector ๐ฎrโ€‹eโ€‹f=โˆ’๐Ÿ๐ฎ๐ง๐ขโ€‹(๐—tโ€‹aโ€‹r(i))\mathbf{u}_{ref}=-\mathbf{f_{uni}}(\mathbf{X}^{(i)}_{tar}) is flipped based on the current vehicle orientation. In this case, the vehicle does not need to turn around but directly move backwards to the target.

Circular Motion Prevention: Equation 2 shows that when the ego-vehicle ii is far away from the parking threshold, the reference orientation is in the direction of ๐—tโ€‹aโ€‹r(i)\mathbf{X}^{(i)}_{tar}. However, as the ego-vehicle approaches the target and the velocity is high, then the vehicle might overshoot the target and enter the shaded marginal parking threshold region causing the current ego-vehicle orientation ๐”t(i)\mathbf{U}^{(i)}_{t} to be opposite to ๐—tโ€‹aโ€‹r(i)\mathbf{X}^{(i)}_{tar} as seen in Fig. 5(a). This will make the vehicle go in circles in an attempt to align itself with the reference orientation. A better alternative would be switch the reference orientation direction so that it is aligned with the ego-vehicle orientation and drive the car in reverse as shown in Fig. 5(b). This is done by dynamically switching the reference orientation depending on the sign of the dot product: fsโ€‹gโ€‹n(๐—tโ€‹aโ€‹r(i)T.๐”t(i))f_{sgn}({\mathbf{X}_{tar}^{(i)}}^{T}.\mathbf{U}^{(i)}_{t}).

Refer to caption
(a) Without Refining Item
Refer to caption
(b) With Refining Item
Figure 6: Shows the reference orientations near the target with/without the refining term fpโ€‹oโ€‹sโ€‹(โ€–๐—tโ€‹aโ€‹r(i)โ€–2โˆ’ฯตp)f_{pos}(\|\mathbf{X}^{(i)}_{tar}\|_{2}-\epsilon_{p}) in ฮปtโ€‹aโ€‹r(i)\lambda^{(i)}_{tar} in Equation 2. Fig. 6(a) shows the situation without the refining term. The reference direction arrows in the left side or right side of the target position are more parallel to the target direction. This will lead to the vehicle shaking forwards and backwards there rather than approaching the target. Therefore, the refining item fpโ€‹oโ€‹sโ€‹(โ€–๐—tโ€‹aโ€‹r(i)โ€–2โˆ’ฯตp)f_{pos}(\|\mathbf{X}^{(i)}_{tar}\|_{2}-\epsilon_{p}) is added to ฮปtโ€‹aโ€‹r(i)\lambda^{(i)}_{tar}. The new reference direction arrows shown in Fig. 6(b) have more biases towards the target position, which can accelerate the parking process.

Parking behaviour Refinement: In Equation 2, an additional term: fpโ€‹oโ€‹sโ€‹(โ€–๐—tโ€‹aโ€‹r(i)โ€–2โˆ’ฯตp)f_{pos}(\|\mathbf{X}^{(i)}_{tar}\|_{2}-\epsilon_{p}) was introduced in ฮปtโ€‹aโ€‹r(i)\lambda^{(i)}_{tar} to refine the parking behavior when the vehicle position is exactly on either the left or the right side of the target. As can be seen in Fig. 6, this additional refinement term causes the reference orientation to be more biased towards the target position within the parking region. The ego vehicle aligns itself better to the reference orientation there, while simultaneously allowing it to reach the final target quicker. Removing this term would make the reference orientations rather parallel to the final target causing the ego vehicle to oscillate forward and backwards around the target leading to a longer parking time. This is particularly true when the ego vehicle position is exactly to the left or to the right side of the target.

VI-B Self-supervised Training of GNN Model

Refer to caption
Figure 7: Shows the pipeline of the self-supervised learning part. The model is trained by running simulation on different training cases without existing labels. For the simulation with TT steps in total, at each step of the simulation, the DV2F are calculated for all the vehicles. For each vehicle, its DV2F gives the reference control variables as training labels. At the same time, the state of the vehicles and the obstacles will be inputted to the training model, obtaining the predicted control variables from the model. Then, the loss is calculated based on the reference and predicted control variables and back-propagated to update the weights of the model. The predicted control variables will also be passed to the vehicle kinetic equation to update the vehicle states for the next simulation step. This process iterates until finishing this simulation turn.

The reference steering angles ฯ†t(i)\varphi^{(i)}_{t} and reference pedal acceleration pt(i)p^{(i)}_{t} mentioned in Section III cannot only be used to control the vehicles directly, but also as labels to supervise training of the GNN model. This approach can therefore additionally be used to train a learning based Graphical Neural Network (GNN) controller in a self-supervised manner using these control labels and network architecture given in [12]. However, [12] requires running an optimization based procedure offline to collect enough training data with ground truth labels. This is a computationally expensive and slow process when the number of agents are large. In contrast, our self-supervised learning method is capable of directly training the model online during the simulation process without collecting ground truth labels beforehand. Note that in [12], all the vehicle nodes in the GNN model will have incoming edges from any other vehicle or obstacle nodes. In this case, the number of edges in the graph will grow quadratic to the number of agents(neighbouring vehicles/obstacles), leading to a heavy computational burden when when scaling to more agents. Thus, we remove edges between the neighboring agent jj or kk and the ego vehicle ii if the distance โ€–Xj(i)โ€–\|X^{(i)}_{j}\| between them is greater than the threshold given by:

Dj/k(i)={2โ‹…rvโ€‹eโ€‹h+|vt(i)|+|vt(j)|+2โ‹…rcjย is a vehicleroโ€‹bโ€‹s(k)+rvโ€‹eโ€‹h+|vt(i)|+2โ‹…rckย is an obstacleD^{(i)}_{j/k}=\begin{cases}2\cdot r_{veh}+|v^{(i)}_{t}|+|v^{(j)}_{t}|+2\cdot r_{c}&\text{$j$ is a vehicle}\\ r^{(k)}_{obs}+r_{veh}+|v^{(i)}_{t}|+2\cdot r_{c}&\text{$k$ is an obstacle}\\ \end{cases} (11)

Training Pipeline: The self-supervised learning pipeline is shown in Fig. 7. At the start of training, different training samples with multiple agents placed at random positions are generated as the start points of the DV2F simulations. Each sample only contains the states of the vehicles and static obstacles without any control labels. At each step during the simulation, the reference control variables of each vehicle is determined online using the DV2F according to the states of the vehicles and obstacles at that time. The loss is calculated on the fly based on the reference control variables and the predicted control variables by the model. This loss is then back-propagated to update the model immediately at the current step. Unlike [12] which solves the individual optimization problem for each case one-by-one, our dynamic velocity vector field determines reference control variables in a closed-form solution. Thus, we can run one simulation with multiple cases as a batch running in parallel, and then change to another batch for the next turn of simulation until finishing all the training cases as an epoch.

Loss Function: The dynamic velocity vector field gives a low reference speed limited by |vd||v_{d}|. However, if the vehicle is still far away from its target and has low risk to colliding with other agents, the speed limit of |vd||v_{d}| can be removed allowing the vehicle to move faster to its target. To this end, we first define a vehicle state cost to evaluate vehicle control. Assume the current vehicle position (xt(i),yt(i))(x^{(i)}_{t},y^{(i)}_{t}), orientation ฮธt(i)\theta^{(i)}_{t} and speed vt(i)v^{(i)}_{t} are fixed, for the given vehicle control variables ฯ†t(i)\varphi^{(i)}_{t} and pt(i)p^{(i)}_{t}, we first apply the vehicle kinematic equation to obtain xt+2(i)x^{(i)}_{t+2}, yt+2(i)y^{(i)}_{t+2}, ฮธt+1(i)\theta^{(i)}_{t+1} and vt+1(i)v^{(i)}_{t+1}. We redefine โ€–๐—tโ€‹aโ€‹r(i)โ€–\|\mathbf{X}^{(i)}_{tar}\|, โ€–๐—oโ€‹bโ€‹sk(i)โ€–\|\mathbf{X}^{(i)}_{obs_{k}}\| and โ€–๐—vโ€‹eโ€‹hj(i)โ€–\|\mathbf{X}^{(i)}_{veh_{j}}\| for the time t+2t+2. The vehicle state cost Cโ€‹(ฯ†t(i),pt(i))C(\varphi^{(i)}_{t},p^{(i)}_{t}) is then calculated as follows:

C=Ctโ€‹aโ€‹r+Coโ€‹bโ€‹s+Cvโ€‹eโ€‹hCtโ€‹aโ€‹r=โ€–๐—tโ€‹aโ€‹r(i)โ€–2Coโ€‹bโ€‹s=โˆ‘k=1Noโ€‹bโ€‹sfmโ€‹aโ€‹x2โ€‹(โˆ’ฮฑoโ€‹bโ€‹sk(i),0)+fmโ€‹aโ€‹xโ€‹(โˆ’ฮฑoโ€‹bโ€‹sk(i),0)Cvโ€‹eโ€‹h=โˆ‘j=1,jโ‰ iNvโ€‹eโ€‹hfmโ€‹aโ€‹x2โ€‹(โˆ’ฮฑvโ€‹eโ€‹hj(i),0)+fmโ€‹aโ€‹xโ€‹(โˆ’ฮฑvโ€‹eโ€‹hj(i),0)\begin{split}C&=C_{tar}+C_{obs}+C_{veh}\\ C_{tar}&=\|\mathbf{X}^{(i)}_{tar}\|_{2}\\ C_{obs}&=\sum_{k=1}^{N_{obs}}f^{2}_{max}(-\alpha^{(i)}_{obs_{k}},0)+f_{max}(-\alpha^{(i)}_{obs_{k}},0)\\ C_{veh}&=\sum_{j=1,j\neq i}^{N_{veh}}f^{2}_{max}(-\alpha^{(i)}_{veh_{j}},0)+f_{max}(-\alpha^{(i)}_{veh_{j}},0)\\ \end{split} (12)

where ฮฑoโ€‹bโ€‹sk(i)=โ€–๐—oโ€‹bโ€‹sk(i)โ€–2โˆ’roโ€‹bโ€‹s(k)โˆ’rvโ€‹eโ€‹hโˆ’(rc+|vt(i)|)\alpha_{obs_{k}}^{(i)}=\|\mathbf{X}^{(i)}_{obs_{k}}\|_{2}-r_{obs}^{(k)}-r_{veh}-(r_{c}+|v^{(i)}_{t}|) and ฮฑvโ€‹eโ€‹hj(i)=โ€–๐—vโ€‹eโ€‹hj(i)โ€–2โˆ’2โ‹…rvโ€‹eโ€‹hโˆ’(rc+|vt(i)|)\alpha_{veh_{j}}^{(i)}=\|\mathbf{X}^{(i)}_{veh_{j}}\|_{2}-2\cdot r_{veh}-(r_{c}+|v^{(i)}_{t}|) are defined same as in Equation 3 mentioned in Section III-B, Ctโ€‹aโ€‹rC_{tar} measures the distance of the ego vehicle ii to its target, and Coโ€‹bโ€‹sC_{obs} and Cvโ€‹eโ€‹hC_{veh} evaluate the collision risk of the ego vehicle. The gradient of this vehicle state cost should alone be enough to guide the GNN model in learning to reach the target while avoiding collisions. However, in practice, the GNN does not converge to the optimal due to high non-linearities in Equation 12. Therefore, we combine this equation with the labels obtained from the dynamic velocity vector field which expedites the model training.

The overall loss function used in this pipeline is defined as follows:

L=Lsโ€‹tโ€‹eโ€‹eโ€‹r+Lpโ€‹eโ€‹dโ€‹aโ€‹lLsโ€‹tโ€‹eโ€‹eโ€‹r=(ฯ†t(i)โˆ’ฯ†~t(i))2Lpโ€‹eโ€‹dโ€‹aโ€‹l={ฮ”โ€‹C+fpโ€‹oโ€‹sโ€‹(ฮ”โ€‹C)โ‹…(ฮ”โ€‹C)2ฮฑtโ€‹aโ€‹r(i)โˆงฮฒtโ€‹aโ€‹r(i)(pt(i)โˆ’p~t(i))2otherwiseฮ”โ€‹C=Cโ€‹(ฯ†t(i),p~t(i))โˆ’Cโ€‹(ฯ†t(i),pt(i))ฮฑtโ€‹aโ€‹r(i)=โ€–๐—tโ€‹aโ€‹r(i)โ€–2โˆ’|v~t+1(i)|โˆ’rp>0ฮฒtโ€‹aโ€‹r(i)=(|v~t+1(i)|>vd)โˆง(|vt+1(i)|=vd)โˆง(v~t+1(i)โ‹…vt+1(i)>0)\begin{split}L&=L_{steer}+L_{pedal}\\ L_{steer}&=(\varphi^{(i)}_{t}-\tilde{\varphi}^{(i)}_{t})^{2}\\ L_{pedal}&=\begin{cases}\Delta C+f_{pos}(\Delta C)\cdot(\Delta C)^{2}&\alpha^{(i)}_{tar}\wedge\beta^{(i)}_{tar}\\ (p^{(i)}_{t}-\tilde{p}^{(i)}_{t})^{2}&\text{otherwise}\\ \end{cases}\\ \Delta C&=C(\varphi^{(i)}_{t},\tilde{p}^{(i)}_{t})-C(\varphi^{(i)}_{t},p^{(i)}_{t})\\ \alpha^{(i)}_{tar}&=\|\mathbf{X}^{(i)}_{tar}\|_{2}-|\tilde{v}^{(i)}_{t+1}|-r_{p}>0\\ \beta^{(i)}_{tar}&=(|\tilde{v}^{(i)}_{t+1}|>v_{d})\wedge(|v^{(i)}_{t+1}|=v_{d})\wedge(\tilde{v}^{(i)}_{t+1}\cdot v^{(i)}_{t+1}>0)\\ \end{split} (13)

where the ฯ†t(i)\varphi^{(i)}_{t}, pt(i)p^{(i)}_{t} and vt+1(i)v^{(i)}_{t+1} are the reference steering angle, reference pedal and calculated reference speed from dynamic velocity vector field, the ฯ†~t(i)\tilde{\varphi}^{(i)}_{t}, p~t(i)\tilde{p}^{(i)}_{t} and v~t+1(i)\tilde{v}^{(i)}_{t+1} are the corresponding values predicted by the GNN model. During training, the steering angle ฯ†~t(i)\tilde{\varphi}^{(i)}_{t} is fully supervised by reference steering angle ฯ†t(i)\varphi^{(i)}_{t}. However, the pedal acceleration loss comprises one of two parts depending on the condition of the vehicle. If the vehicle is far way from the parking region, i.e. โ€–๐—tโ€‹aโ€‹r(i)โ€–2โˆ’|v~t+1|โˆ’rp>0\|\mathbf{X}^{(i)}_{tar}\|_{2}-|\tilde{v}_{t+1}|-r_{p}>0, and the default reference speed vdv_{d} limits the predicted speed, i.e. (|v~t+1|>vd)โˆง(|vt+1|=vd)โˆง(v~t+1โ‹…vt+1>0)(|\tilde{v}_{t+1}|>v_{d})\wedge(|v_{t+1}|=v_{d})\wedge(\tilde{v}_{t+1}\cdot v_{t+1}>0), then the pedal loss is supervised by the relative vehicle state cost Cโ€‹(ฯ†t(i),p~t(i))โˆ’Cโ€‹(ฯ†t(i),pt(i))C(\varphi^{(i)}_{t},\tilde{p}^{(i)}_{t})-C(\varphi^{(i)}_{t},p^{(i)}_{t}). This allows the vehicle to speed up when no other agents are nearby and slow down when getting close to other objects or its target. Otherwise, the pedal acceleration loss is supervised by the reference pedal p~t(i)\tilde{p}^{(i)}_{t}. Note that we use the reference steering angle ฯ†t(i)\varphi^{(i)}_{t} rather than the predicted steering angle ฯ†~t(i)\tilde{\varphi}^{(i)}_{t} to calculate the vehicle state cost.

VI-C Training & Test Data Generation

The advantage of our self-supervised approach is that the data for training the GNN model can be generated on the fly, since it does not require supervised labels. However, for the purpose of reproducibilty of our experiments, we generate data beforehand. One common option is to generate the samples by choosing a vehicleโ€™s starting and target positions randomly on the navigation grid. However, the risk with this approach is that the dataset might be heavily skewed in favour of one scenario and may not capture other types of situations that the vehicle is expected to encounter. In contrast, our training regime is developed to handle these diverse situations. These situations can primarily be segregated into 3 modes: collision, parking and normal driving. In the collision mode, the vehicles are placed such that they have a high probability of collision. This is done by first randomly choosing a point on the grid, defined as a โ€collision centerโ€. The starting and target position of at least two vehicles are placed on opposite sides of this collision center with slight random deviation. Hence, the model will be pushed to learn a collision avoidance maneuver. Meanwhile, in the parking mode, the target position for each vehicle is sampled near its start position (within a distance of 1010 m). Note that in both the collision and parking modes, the position of the vehicles are also chosen randomly but with certain constraints i.e. starting and target positions being on the opposite sides of collision center (collision mode) or in close proximity (parking mode). In the normal driving mode, the starting and target positions are chosen randomly without any of the constraints described earlier. Lastly, for all modes the following additional conditions are to be fulfilled: No two target or two staring positions of vehicles can overlap. Likewise, if there are static obstacles, the starting/target positions cannot overlap with it. The starting position can be placed within an obstacleโ€™s circle of influence but not a target position, since then the vehicle will struggle attaining equilibrium. The target position will attract the ego-vehicle whereas the obstacle influence will repel it. Our training dataset contains training cases from 1 vehicle 0 obstacles to 5 vehicles 8 obstacles, each with 3000 samples that in turn contains 1000 samples from each of the 3 modes. Each of these 3000 scenarios serve as the starting point for the simulation that is run for T=200T=200 timesteps during training, in order for the vehicles to reach their target positions. As mentioned in Section IV-A, we also generate test scenarios range from 10 vehicles - 0 obstacles to 50 vehicles - 25 obstacles, each with 1000 samples. However, our test dataset is generated only using the collision mode.

Multiple training samples can be grouped as a batch and run simultaneously. However, this training simulation will always end with all the vehicles in the training batch approaching their targets and parking. To prevent the model from overfitting on parking behavior, we adopt a asynchronous simulation during the training. Specifically, a training batch if further divided into multiple mini-batches. These mini-batches are in different simulation time steps. In this case, within a training batch, there are always some training samples just starting, some driving the vehicles on the way and the rest parking the vehicles. Besides, for each time step tt during the training simulation, we also perturbs the vehicle state using a zero-mean Gaussian noise as a data augmentation. The standard variance decreases linearly along the simulation time tt: ๐ˆt=Tโˆ’tTโ‹…[ฯƒx,ฯƒy,ฯƒฯ†,ฯƒv]T\boldsymbol{\sigma}_{t}=\frac{T-t}{T}\cdot[\sigma_{x},\sigma_{y},\sigma_{\varphi},\sigma_{v}]^{T}. The validation dataset reuse the training samples without random perturbation. The simulation length during validation is reduced to T=1T=1.

To train the neural network, we use the Adam optimizer with an initial learning rate of 0.010.01 and weight decay of 10โˆ’610^{-6}. The learning rate is reduced by factor of 0.20.2 from its previous value, if the validation loss does not reduce for 15 epochs. The number of training epochs is set to be 500500 but training is prematurely stopped, if there is no decrease in the validation loss for 5050 epochs.

Refer to caption
(a) safe rate with no obstacles
Refer to caption
(b) safe rate with 25 obstacles
Refer to caption
(c) reach rate with no obstacles
Refer to caption
(d) reach rate with 25 obstacles
Refer to caption
(e) Legend
Figure 8: Shows the Safe and Reach rate metrics (Higer is better) for the different static part rcr_{c} of collision avoidance margin.

VI-D Experimental Settings

In this section, we elaborate the detailed hyper-parameter settings in our experiments.

In the vehicle kinematic Equation 1, the inverse of the vehicle length ฮณ\gamma is 0.50.5 and the friction coefficient ฮฒ\beta is 0.990.99. The pedal acceleration threshold PP introduced in Section III-A is 11 m/s2s^{2}. The steering angle limits ฮฆ\Phi is 0.80.8 radians. The timestep ฮ”โ€‹t\Delta t is 0.20.2 s.

For the setting of velocity vector field in Subsection III-B and III-C, the default velocity vdv_{d} is 2.52.5 m/s. The parking threshold rpr_{p} is 55 m. The radius of the obstacle roโ€‹bโ€‹s(k)r^{(k)}_{obs} is sampled from 11 m to 33 m. The radius of the vehicle rvโ€‹eโ€‹h(j)r^{(j)}_{veh} is fixed to be 1.51.5 m. The |vt(i)||v^{(i)}_{t}| and |vt(j)||v^{(j)}_{t}| in the dynamic part of collision avoidance margin are determined to be the corresponding vehicle speed and the vehicle speed will not exceed the default velocity vd=2.5v_{d}=2.5 m/s in our MA-DV2F algorithm. The static part of collision avoidance margin rcr_{c} is fixed to be 1.51.5 m. The position tolerance ฯตp\epsilon_{p} and the orientation tolerance ฯตo\epsilon_{o} are 0.250.25 m and 0.20.2 m, respectively. The sensing radius of each vehicle should be large than the threshold determined in Equation 11.

VI-E Sensitivity Analysis

In this Subsection, we investigate how the collision avoidance margin influences the success rate. As the dynamic part of collision avoidance margin is determined by vehicle speed, we only change the static part rcr_{c} in our experiment. As can be seen in Table III and Fig. 8, that if the safety margin rc is reduced from its default value of 1.5 m, the success rate reduces drastically. This is because MA-DV2F considers only those neighboring agents (other vehicles and obstacles) that are in its immediate vicinity. Hence, when determining steering commands for the ego-vehicle, MA-DV2F would not consider the presence of those agents that are close but not in the immediate vicinity. Therefore, if the ego-vehicle is moving at high speed, then by the time a neighboring vehicle enters its immediate vicinity, it might have been too late for the ego-vehicle to execute an evasive maneuver away from that neighboring vehicle. This will lead to higher probability of collision, hence a lower safe rate (Bottom row) and consequently a lower success rate (Top row) as can be seen in the plots in Fig. (a), for all the vehicle configurations (10-50 vehicles).

Conversely, if the safety margin rc is increased from its default value of 1.5 m, the success rate also reduces but for a different reason. Note that with a greater safety margin, the number of neighboring agents that would be considered by the MA-DV2F for steering command prediction will also be greater. This will cause the ego-vehicle to behave conservatively in order to prevent collisions at the expense of reaching the target. This results in a lower reach rate (center row) and consequentially a lower success rate (Top row). This is further exacerbated when there are obstacles in the scene (Right column), since the presence of obstacles reduces the region in which the vehicles can navigate without causing collisions.

Number of Number of static part of collision avoidance margin rcr_{c}
Vehicles Obstacles 0.0 0.75 1.5 2.25 3.0
success rate โ†‘\uparrow
10 0 0.8576 0.9994 1.0000 0.9992 0.9955
20 0 0.7910 0.9990 1.0000 0.9991 0.9876
30 0 0.7631 0.9992 1.0000 0.9976 0.9795
40 0 0.7456 0.9984 1.0000 0.9978 0.9701
50 0 0.7244 0.9982 1.0000 0.9986 0.9683
10 25 0.5324 0.8363 0.9952 0.9577 0.6637
20 25 0.4146 0.7764 0.9902 0.9171 0.5375
30 25 0.3434 0.7339 0.9844 0.8713 0.4750
40 25 0.3032 0.7032 0.9772 0.8306 0.4480
50 25 0.2775 0.6843 0.9704 0.7945 0.4363
TABLE III: Shows how the static part of collision avoidance margin rcr_{c} effects success rate.