This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Fully Distributed Informative Planning for Environmental Learning with Multi-Robot Systems

Dohyun Jang1, Jaehyun Yoo2, Clark Youngdong Son3, and H. Jin Kim1 1Dohyun Jang and H. Jin Kim are with the Department of Aerospace Engineering, Seoul National University (SNU), Seoul, South Korea (e-mail: dohyun@snu.ac.kr; hjinkim@snu.ac.kr).2Jaehyun Yoo is with the School of AI Convergence, Sungshin Women’s University, Seoul, South Korea (e-mail: jhyoo@sungshin.ac.kr).3Clark Youngdong Son is with Mechatronics R&D Center, Samsung Electronics, Hwaseong, South Korea (e-mail: y0dong.son@samsung.com).Digital Object Identifier (DOI): see top of this page.
Abstract

This paper proposes a cooperative environmental learning algorithm working in a fully distributed manner. A multi-robot system is more effective for exploration tasks than a single robot, but it involves the following challenges: i) online distributed learning of environmental map using multiple robots; ii) generation of safe and efficient exploration path based on the learned map; and iii) maintenance of the scalability with respect to the number of robots. To this end, we divide the entire process into two stages of environmental learning and path planning. Distributed algorithms are applied in each stage and combined through communication between adjacent robots. The environmental learning algorithm uses a distributed Gaussian process, and the path planning algorithm uses a distributed Monte Carlo tree search. As a result, we build a scalable system without the constraint on the number of robots. Simulation results demonstrate the performance and scalability of the proposed system. Moreover, a real-world-dataset-based simulation validates the utility of our algorithm in a more realistic scenario.

Index Terms:
Multi-Robot Systems, Distributed Systems, Informative Planning, Environmental Learning, Gaussian Process.

I Introduction

Robotic sensor networks, which combine the local sensing capabilities of various sensors with the mobility of robots, can provide more versatility than conventional fixed sensor networks due to their capability to extend the sensing range and improve the resolution of sensory data maps [1]. These networks have been studied extensively in survey of global environment [2, 3, 4, 5], industrial environment perception [6], radio signal search [7], and so on.

To construct sensor networks, we first deploy many sensors in a working space. Then, we establish communication channels with the central server to collect and fuse data acquired from all sensors. Since the wireless communication range of sensors is limited, sensors usually make an indirect connection with the central server, such as a mesh network that connects all sensors and the central server by relay channels.

However, the relay network requires a routing table that must be rebuilt every time the robot network is reconfigured, which is cumbersome for robotic sensor networks. This problem is particularly noticeable in unmanned aerial vehicles (UAVs) or small robots since they need to use relatively weak communication modules to reduce power consumption.

Decentralizing the system can be a proper solution to network problems by removing the dependency of robots on the central server. For example, if a robot can infer the entire sensory map only from the local information directly provided by surrounding robots, the search task can be completed without the help of the central server. This paper applies decentralization to the environmental learning phase and the path planning phase, respectively. With an online information fusion algorithm, we build a distributed autonomous system of multiple robots to search and learn even dynamic environments that change over time.

Refer to caption
Figure 1: Temperature monitoring simulation in Seoul using multiple UAVs; (left) trajectories of UAVs performing cooperative work through a distributed communication network, and (right) the reconstructed temperature map.

I-A Literature Review

The first part of our work is multi-robot environmental learning in a distributed manner. For environmental learning, some useful techniques exist such as Gaussian mixture model (GMM) [8, 16, 17], finite element method (FEM) [9], and Gaussian process (GP) regression [4, 5, 7]. In particular, GP is a popular approach that derives a spatial relationship between sampled data using a kernel and performs Bayesian inference for prediction at an unknown region.

However, most GP-related researches focus on centralized systems, making it difficult to expand to large-scale multi-robot systems due to network resource limitations such as channel bandwidth and transmit power. Distributed multi-agent Gaussian regression is introduced in [12], which designs a finite-dimensional GP estimator by using Karhunen–Loève (KL) expansion [13]. In contrast to the decentralized GP presented in [10], the distributed GP provides a common copy of the global estimate to all agents by exchanging the estimated information with their neighbors. This paper extends [14], which shows that distributed GPs can construct environmental models using mobile robots in order to take the distributed path planning into account.

The second part of our work is informative path planning in a distributed multi-robot system. As an initial study of informative path planning, the problem of optimal sensor placement has been investigated to create an environmental map in a given space by properly placing a finite number of sensors [15, 16, 17]. Since then, by applying GPs and information theory, the research of optimal sensor placement has grown into the informative path planning research as presented in [5, 8, 18, 19, 20, 21, 22]. Some studies have combined GP with conventional planning algorithms such as rapidly-exploring random tree (RRT) [23], dynamic programming (DP) [20], or Monte Carlo tree search (MCTS) [24].

Besides the above approaches that mainly focus on informative path planning for single agents, many studies have applied informative planning for multi-robot systems. In [10, 7], although both studies deal with decentralized multi-robot exploration using GP, these algorithms are not scalable as they consider only two robots. [33] introduced the combination of the Kalman filter (KF) and the reduced value iteration (RVI) method for the parallelized active information gathering. While this technique is scalable to a large number of robots, it is noted that the environmental model has to be known, and only discrete environments can be represented since the model is expressed in KF. Considering the scalability for multi-robot systems, we extend the MCTS path planning in a distributed manner to be compatible with the distributed GP.

I-B Our Contribution

To achieve our goal of fully distributed multi-robot informative planning, we divide the whole process into two phases: environmental learning and path planning. During these phases, we focus on three main contributions as follows.

• We develop an online distributed GP algorithm for environmental learning through Karhunen–Loève expansion and an infinite impulse response filter. This algorithm is capable of learning a dynamic environment.

• We propose a distributed informative path planning algorithm using a distributed MCTS combined with GP. In addition, we introduce the trajectory merging method to consider predicted trajectories of other agents.

• We build a fully distributed exploration and learning architecture using only local peer-to-peer communication for system scalability, as shown in Table I.

We perform a multi-robot exploration simulation with a virtual environment setting and real-world dataset [34] provided by the National Climate Data Center (NCDC) in South Korea as shown in Fig. 1.

The outline of this paper is as follows. Section \@slowromancapii@ briefly describes a multi-robot system setup and preliminaries. Section \@slowromancapiii@ presents a method for online distributed environmental learning. Section \@slowromancapiv@ combines environmental learning and MCTS in the distributed system. Simulations for the synthetic environment and real-world dataset are presented in Section \@slowromancapv@. Section \@slowromancapvi@ concludes the paper.

TABLE I: Scalability comparison between centralized and distributed systems for multi-agent tasks. See text for symbols.
Centralized Distributed (ours)
GP Computation
Complexity
O((mn)3)O((mn)^{3})  ([7, 10]) O(E3)O(E^{3})  ([14])
MCTS Planner
Action Cardinality
|𝒜|n\lvert\mathcal{A}\rvert^{n}  ([11]) |𝒜|\lvert\mathcal{A}\rvert  ([26])
Communication
Complexity
O(n2)O(n^{2})  ([11, 8]) O(n)O(n)  ([7, 10, 14])

II Multi-Robot System Setup and Preliminaries

We focus on the environment learning problem in multi-robot systems by considering a target domain as a 3-dimensional compact set 𝕏w3\mathbb{X}_{w}\subset\mathbb{R}^{3}. Multiple robots (e.g., ground vehicles or UAVs with onboard sensors) explore an unknown area and estimate environmental information using both self-measurements and shared data received from neighbors. All robots can discover obstacles nearby using the range sensor and only communicate with adjacent robots within the communication distance.

Refer to caption
Figure 2: Structure of the distributed exploration and environmental model learning system. Each agent has its own distributed GP and distributed MCTS planner modules that operate through peer-to-peer communication with each other.

II-A Multi-Robot System Setup

As depicted in Fig. 1, we consider nn robot agents exploring the environment. Each robot ii takes the measurement ykiy_{k}^{i} of an unknown environmental process f()f(\cdot) in its position 𝐱ki𝕏w\mathbf{x}_{k}^{i}\in\mathbb{X}_{w} (i=1,,ni=1,\cdots,n) at time kk which has the following relationship: {ceqn}

yki=f(𝐱ki)+vki,\begin{array}[]{ r>{{}}l @{\quad} l @{\quad} r>{{}}l @{\quad} l }y_{k}^{i}&=&f(\mathbf{x}_{k}^{i})+v_{k}^{i},\\ \end{array} (1)

where the measurement of f(𝐱ki)f(\mathbf{x}_{k}^{i}) is corrupted by the additive white Gaussian noise vki𝒩(0,σv2)v_{k}^{i}\sim\mathcal{N}(0,\sigma_{v}^{2}).

Each robot has its process modules, Distributed GP and Distributed MCTS, for the distributed monitoring task. During these processes, they share GP variables and predicted trajectories through a peer-to-peer communication network. This operation process is summarized in Fig. 2. The controller design process is not covered in this work.

To implement the communication network of nn robots, we define a set of neighbors for robot ii as 𝒩ki={j|𝐱ki𝐱kj<dcomm,j𝒩/i}\mathcal{N}_{k}^{i}=\{j|\ {||\mathbf{x}_{k}^{i}-\mathbf{x}_{k}^{j}||}<d_{comm},j\in\mathcal{N}/i\}, where 𝒩={1,2,,n}\mathcal{N}=\{1,2,\cdots,n\} is the index set of agents and dcommd_{comm} is the communication range. 𝒩i+\mathcal{N}^{i+} means 𝒩i{i}\mathcal{N}^{i}\cup\{i\}. For arbitrary variable AA, Ai+A^{i+} means {Aj}j𝒩i+\{A^{j}\}_{j\in\mathcal{N}^{i+}}, and AiA^{i-} means {Aj}j𝒩i\{A^{j}\}_{j\in\mathcal{N}^{i}} for brevity.

II-B Conventional Gaussian Process

GP regression, which is data-driven non-parametric learning, can provide Bayesian inference over the set 𝕏w\mathbb{X}_{w}, taking into account joint Gaussian probability distribution between the sampled dataset [27]. In (1), the unknown process model f()f(\cdot) is assumed to follow a zero-mean Gaussian process as

{ceqn}
f()𝒢𝒫(0,κ(𝐱,𝐱′′)).\begin{array}[]{ r>{{}}l @{\quad} l @{\quad} r>{{}}l @{\quad} l }f(\cdot)\sim\mathcal{G}\mathcal{P}(0,\kappa(\mathbf{x^{\prime}},\mathbf{x^{\prime\prime}})).\\ \end{array} (2)

κ(𝐱,𝐱′′)\kappa(\mathbf{x^{\prime}},\mathbf{x^{\prime\prime}}) is a kernel or covariance function for positions 𝐱,𝐱′′𝕏w\mathbf{x^{\prime}},\mathbf{x^{\prime\prime}}\in\mathbb{X}_{w}, and the original squared exponential (SE) kernel is defined as

{ceqn}
κ(𝐱,𝐱′′)=σs2exp(12(𝐱𝐱′′)Σl1(𝐱𝐱′′)),\displaystyle\kappa(\mathbf{x^{\prime}},\mathbf{x^{\prime\prime}})=\sigma_{s}^{2}\text{exp}(-\frac{1}{2}(\mathbf{x^{\prime}}-\mathbf{x^{\prime\prime}})^{\top}\Sigma_{l}^{-1}(\mathbf{x^{\prime}}-\mathbf{x^{\prime\prime}})), (3)

where σs2\sigma_{s}^{2} is the signal variance of f()f(\cdot), and Σl\Sigma_{l} is the length scale. The hyper parameters σs2\sigma_{s}^{2} and Σl\Sigma_{l} can be determined by maximizaing the marginal likelihood [27].

Formally, let Dki={(𝐱ti,yti)}tMkD_{k}^{i}=\{(\mathbf{x}_{t}^{i},y_{t}^{i})\}_{t\in M_{k}} be the training dataset sampled by the robot ii, where tt is the sampling time. MkM_{k} is the set of sampling time indices up to time kk. With the dataset DkiD_{k}^{i} of size mkim_{k}^{i}, we can simply define the input data matrix as 𝐗ki=[𝐱¯1i,,𝐱¯mkii]mki×3\mathbf{X}_{k}^{i}=[\mathbf{\bar{x}}_{1}^{i},\cdots,\mathbf{\bar{x}}_{m_{k}^{i}}^{i}]^{\top}\in\mathbb{R}^{m_{k}^{i}\times 3} and the output data vector as 𝐲ki=[y¯1i,,y¯mkii]mki×1\mathbf{y}_{k}^{i}=[\bar{y}_{1}^{i},\cdots,\bar{y}_{m_{k}^{i}}^{i}]^{\top}\in\mathbb{R}^{m_{k}^{i}\times 1}. According to the test point 𝐱𝕏w\mathbf{x}\in\mathbb{X}_{w}, the posterior distribution over f(𝐱)f(\mathbf{x}) by robot ii is derived as follows: {ceqn}

p(f(𝐱)|𝐗ki,𝐲ki,𝐱)𝒩(f¯i(𝐱),Σi(𝐱)),p(f(\mathbf{x})|\mathbf{X}_{k}^{i},\mathbf{y}_{k}^{i},\mathbf{x})\sim\mathcal{N}(\bar{f}^{i}(\mathbf{x}),\Sigma^{i}(\mathbf{x})),\\ (4)

where

f¯i(𝐱)\displaystyle\bar{f}^{i}(\mathbf{x}) =𝐤(𝐗ki,𝐱)(𝐊(𝐗ki,𝐗ki)+σv2I)1𝐲ki,\displaystyle=\mathbf{k}^{\top}(\mathbf{X}_{k}^{i},\mathbf{x})(\mathbf{K}(\mathbf{X}_{k}^{i},\mathbf{X}_{k}^{i})+\sigma_{v}^{2}I)^{-1}\mathbf{y}_{k}^{i}, (5a)
Σi(𝐱)=κ(𝐱,𝐱)𝐤(𝐗ki,𝐱)(𝐊(𝐗ki,𝐗ki)+σv2I)1𝐤(𝐗ki,𝐱).\displaystyle\begin{split}\Sigma^{i}(\mathbf{x})&=\kappa(\mathbf{x},\mathbf{x})\\ &\ \ -\mathbf{k}^{\top}(\mathbf{X}_{k}^{i},\mathbf{x})(\mathbf{K}(\mathbf{X}_{k}^{i},\mathbf{X}_{k}^{i})+\sigma_{v}^{2}I)^{-1}\mathbf{k}(\mathbf{X}_{k}^{i},\mathbf{x}).\end{split} (5b)

𝐊(𝐗ki,𝐗ki)\mathbf{K}(\mathbf{X}_{k}^{i},\mathbf{X}_{k}^{i}) is the mki×mkim_{k}^{i}\times m_{k}^{i} kernel matrix whose (u,v)(u,v)-th element is κ(𝐱¯ui,𝐱¯vi)\kappa(\mathbf{\bar{x}}_{u}^{i},\mathbf{\bar{x}}_{v}^{i}) for 𝐱¯ui,𝐱¯vi𝐗ki\mathbf{\bar{x}}_{u}^{i},\mathbf{\bar{x}}_{v}^{i}\in\mathbf{X}_{k}^{i}. 𝐤(𝐗ki,𝐱)\mathbf{k}(\mathbf{X}_{k}^{i},\mathbf{x}) is the mki×1m_{k}^{i}\times 1 column vector that is also obtained in the same way.

II-C Informative Path Planning

To obtain the better description of a spatial process model, robots perform informative path planning. It maximizes the information gain 𝕀(;)\mathbb{I}(;), which is the mutual information between the process ff and measurements 𝐲\mathbf{y}:

𝕀(𝐲;f)=H(𝐲)H(𝐲|f),\displaystyle\mathbb{I}(\mathbf{y};f)=\textrm{H}(\mathbf{y})-\textrm{H}(\mathbf{y}|f), (6)

where H()\textrm{H}(\cdot) is the entropy of a random variable. Let X1:kiX_{1:k}^{i} be the possible trajectory of robot ii and X1:k={X1:k1,,X1:kn}X_{1:k}=\{X_{1:k}^{1},\cdots,X_{1:k}^{n}\} be the possible trajectories of all robots. Then, the multi-robot team’s global objective function J(X1:k)J(X_{1:k}) is defined as follows:

J(X1:k)=𝕀(𝐲1:k;f).\displaystyle J(X_{1:k})=\mathbb{I}(\mathbf{y}_{1:k};f). (7)

𝐲1:k\mathbf{y}_{1:k} is the measurements corresponding to X1:kX_{1:k}. As a result, the optimal trajectories for all agents are defined as follows:

{ceqn}
X1:k=argmaxX1:kJ(X1:k)=argmaxX1:k𝕀(𝐲1:k;f)=argmaxX1:k(H(𝐲1:k)H(𝐲1:k|f)).\begin{array}[]{ r>{{}}l @{\ } l @{\ } r>{{}}l @{\ } l }X_{1:k}^{*}&=&\operatorname*{arg\,max}\limits_{X_{1:k}}J(X_{1:k})\\ &=&\operatorname*{arg\,max}\limits_{X_{1:k}}\mathbb{I}(\mathbf{y}_{1:k};f)\\ &=&\operatorname*{arg\,max}\limits_{X_{1:k}}(\textrm{H}(\mathbf{y}_{1:k})-\textrm{H}(\mathbf{y}_{1:k}|f)).\end{array} (8)

In this study, the entropy H()\textrm{H}(\cdot) is obtained using GP. With the result of GP estimation (5), the optimal trajectory generation for each agent will be addressed in Section IV.

III Environmental Learning: Distributed Gaussian Process

In this section, we expand the conventional GP in Section II-B to the distributed GP. The first step is to expand the conventional kernel (3) to be an infinite sum of eigenfunctions. Then, the expanded kernel is used to make a finite-dimensional GP estimator, and the estimator is reformulated to a distributed form. With a consecutive state update rule, the GP estimator works in a distributed manner.

III-A Karhunen–Loève (KL) Kernel Expansion

Let the usual GP consider nn robots. We can simply define the input data matrix for nn robots as 𝐗k=[𝐗k1,,𝐗kn]mn×3\mathbf{X}_{k}=[\mathbf{X}_{k}^{1\top},\cdots,\mathbf{X}_{k}^{n\top}]^{\top}\in\mathbb{R}^{mn\times 3}. For simplicity, it is assumed that mkim_{k}^{i}’s are same for all robots, and we omit the subscript kk, so mki=mm^{i}_{k}=m hereafter. With the matrix 𝐗k\mathbf{X}_{k}, the usual GP requires all the sampled data 𝐗k\mathbf{X}_{k} and inversion of K(𝐗k,𝐗k)K(\mathbf{X}_{k},\mathbf{X}_{k}) with O((mn)3)O((mn)^{3}) operations. These requirements are impractical when peer-to-peer communication is only used, and the computational burden also increases depending on the data size. For this reason, a new kernel method is needed. The kernel (3) can be expanded in terms of eigenfunctions ϕe\phi_{e} and corresponding eigenvalues λe\lambda_{e} as follows [13]:

κ(𝐱,𝐱′′)=e=1+λeϕe(𝐱)ϕe(𝐱′′),\displaystyle\kappa(\mathbf{x}^{\prime},\mathbf{x}^{\prime\prime})=\sum\limits_{e=1}^{+\infty}\lambda_{e}\phi_{e}(\mathbf{x}^{\prime})\phi_{e}(\mathbf{x}^{\prime\prime}), (9)

where λeϕe(𝐱)=𝕏wκ(𝐱,𝐱′′)ϕe(𝐱′′)𝑑μ(𝐱′′)\lambda_{e}\phi_{e}(\mathbf{x}^{\prime})=\int_{\mathbb{X}_{w}}\kappa(\mathbf{x}^{\prime},\mathbf{x}^{\prime\prime})\phi_{e}(\mathbf{x}^{\prime\prime})d\mu(\mathbf{x}^{\prime\prime}). It is difficult to derive the kernel eigenfunctions in a closed-form, but the SE kernel expansion has already been obtained via Hermite polynomials, as mentioned in [28]. Then, the process model ff for the position 𝐱𝕏w\mathbf{x}\in\mathbb{X}_{w} is expanded as

f(𝐱)=e=1Eaeϕe(𝐱)+e=E+1+aeϕe(𝐱)=fE(𝐱)+e=E+1+aeϕe(𝐱),\displaystyle\begin{array}[]{r @{\ } l @{\ } l>{{}}l @{\ } l }f(\mathbf{x})&=&\sum\limits_{e=1}^{E}a_{e}\phi_{e}(\mathbf{x})+\sum\limits_{e=E+1}^{+\infty}a_{e}\phi_{e}(\mathbf{x})\\ &=&f_{E}(\mathbf{x})+\sum\limits_{e=E+1}^{+\infty}a_{e}\phi_{e}(\mathbf{x}),\end{array} (12)

where ae𝒩(0,λe)a_{e}\sim\mathcal{N}(0,\lambda_{e}) for e=1,2,,e=1,2,\cdots,\infty. fE(𝐱)f_{E}(\mathbf{x}) is the EE-dimensional model of f(𝐱)f(\mathbf{x}) where EE is a constant design parameter. This parameter can be tuned by the SURE strategies [12]. As shown in [28], the optimal EE-dimensional models can be obtained by a convex combination of the first EE-kernel eigenfunctions as the size of sampled dataset increases to infinity.

III-B Multi-Agent Distributed Gaussian Process

We apply EE-dimensional approximation to the GP estimator in (5) to derive the estimation of fE(𝐱)f_{E}(\mathbf{x}). According to EE-dimensional approximation, the kernel function (9) can be described as κ(𝐱,𝐱′′)e=1Eλeϕe(𝐱)ϕe(𝐱′′)\kappa(\mathbf{x}^{\prime},\mathbf{x}^{\prime\prime})\approx\sum\limits_{e=1}^{E}\lambda_{e}\phi_{e}(\mathbf{x}^{\prime})\phi_{e}(\mathbf{x}^{\prime\prime}). For the input data matrix 𝐗k\mathbf{X}_{k}, kernel matrices included in (5) are defined by

𝐊(𝐗k,𝐗k)\displaystyle\mathbf{K}(\mathbf{X}_{k},\mathbf{X}_{k}) =GΛEG,\displaystyle=G\Lambda_{E}G^{\top}, (13a)
𝐤(𝐗k,𝐱)\displaystyle\mathbf{k}(\mathbf{X}_{k},\mathbf{x}) =GΛEΦ(𝐱),\displaystyle=G\Lambda_{E}\Phi(\mathbf{x}), (13b)

where Φ(𝐱):=[ϕ1(𝐱),,ϕE(𝐱)]\Phi(\mathbf{x}):=\begin{bmatrix}\phi_{1}(\mathbf{x}),\cdots,\phi_{E}(\mathbf{x})\end{bmatrix}^{\top} and G:=[Φ(𝐱¯11)Φ(𝐱¯m1),,Φ(𝐱¯1n)Φ(𝐱¯mn)]G:=\begin{bmatrix}\Phi(\mathbf{\bar{x}}_{1}^{1})\cdots\Phi(\mathbf{\bar{x}}_{m}^{1}),\cdots,\Phi(\mathbf{\bar{x}}_{1}^{n})\cdots\Phi(\mathbf{\bar{x}}_{m}^{n})\end{bmatrix}^{\top}. ΛE\Lambda_{E} is the diagonal matrix of kernel eigenvalues.

With (5a) and (13a), the EE-dimensional estimator for GP is expressed as follows [12]:

f¯E(𝐱):=Φ(𝐱)HE𝐲,\displaystyle\begin{array}[]{ r>{{}}l @{\quad} l @{\quad} r>{{}}l @{\quad} l }\bar{f}_{E}(\mathbf{x})&:=&\Phi^{\top}(\mathbf{x})H_{E}\mathbf{y},\end{array} (15)

where

HE\displaystyle H_{E} :=(GGmn+σv2mnΛE1)1Gmn.\displaystyle:=\left(\dfrac{G^{\top}G}{mn}+\dfrac{\sigma_{v}^{2}}{mn}\Lambda_{E}^{-1}\right)^{-1}\dfrac{G^{\top}}{mn}. (16)

Because each agent cannot obtain GG and 𝐲\mathbf{y} in (15) without a fully connected network, we decompose the associated terms included in (16) as follows:

GGmn=1mni=1nt=1mΦ(𝐱¯ti)Φ(𝐱¯ti)=1ni=1nαmi,\begin{array}[]{ r>{{}}l @{\quad} l @{\quad} r>{{}}l @{\quad} l }\dfrac{G^{\top}G}{mn}&=\dfrac{1}{mn}\sum\limits_{i=1}^{n}\sum\limits_{t=1}^{m}\Phi(\mathbf{\bar{x}}_{t}^{i})\Phi^{\top}(\mathbf{\bar{x}}_{t}^{i})=\dfrac{1}{n}\sum\limits_{i=1}^{n}\alpha_{m}^{i},\\ \end{array} (17a)
G𝐲mn=1mni=1nt=1mΦ(𝐱¯ti)y¯ti=1ni=1nβmi,\begin{array}[]{ r>{{}}l @{\quad\quad\quad\;\,} l @{\quad} r>{{}}l @{\quad} l }\dfrac{G^{\top}\mathbf{y}}{mn}&=\dfrac{1}{mn}\sum\limits_{i=1}^{n}\sum\limits_{t=1}^{m}\Phi(\mathbf{\bar{x}}_{t}^{i})\bar{y}_{t}^{i}=\dfrac{1}{n}\sum\limits_{i=1}^{n}\beta_{m}^{i},\end{array} (17b)

where αmi:=t=1mΦ(𝐱¯ti)Φ(𝐱¯ti)/m\alpha_{m}^{i}:=\sum_{t=1}^{m}\Phi(\mathbf{\bar{x}}_{t}^{i})\Phi^{\top}(\mathbf{\bar{x}}_{t}^{i})/m and βmi:=t=1mΦ(𝐱ti)y¯ti/m\beta_{m}^{i}:=\sum_{t=1}^{m}\Phi(\mathbf{x}_{t}^{i})\bar{y}_{t}^{i}/m are GP states after the mm-th sensor measurements. Now (15) is reformulated in the following distributed form:

{ceqn}
f¯Ei(𝐱):=Φ(𝐱)(αmi+σv2mnΛE1)1βmi.\begin{array}[]{ r>{{}}l @{\quad} l @{\quad} r>{{}}l @{\quad} l }\bar{f}_{E}^{i}(\mathbf{x})&:=&\Phi^{\top}(\mathbf{x})\left(\alpha_{m}^{i}+\dfrac{\sigma_{v}^{2}}{mn}\Lambda_{E}^{-1}\right)^{-1}\beta_{m}^{i}.\\ \end{array} (18)

As the results of average consensus protocol [29], (18) converges to (15) after iterative communication. Similarly, the distributed form of Σ(𝐱)\Sigma(\mathbf{x}) in (5b) is expressed as

{ceqn}
ΣE(𝐱):=κ(𝐱,𝐱)Φ(𝐱)HEGΛEΦ(𝐱),\begin{array}[]{ r>{{}}l @{\quad} l @{\quad} r>{{}}l @{\quad} l }\Sigma_{E}(\mathbf{x})&:=&\kappa(\mathbf{x},\mathbf{x})-\Phi^{\top}(\mathbf{x})H_{E}G\Lambda_{E}\Phi(\mathbf{x}),\end{array} (19)
{ceqn}
ΣEi(𝐱):=κ(𝐱,𝐱)Φ(𝐱)(αmi+σv2mnΛE1)1×αmiΛEΦ(𝐱).\begin{array}[]{ r>{{}}l @{\ } l @{\ } r>{{}}l @{\ } l }\Sigma_{E}^{i}(\mathbf{x}):&=&\kappa(\mathbf{x},\mathbf{x})-\Phi^{\top}(\mathbf{x})\left(\alpha_{m}^{i}+\dfrac{\sigma_{v}^{2}}{mn}\Lambda_{E}^{-1}\right)^{-1}\\ &&\times\alpha_{m}^{i}\Lambda_{E}\Phi(\mathbf{x}).\end{array} (20)

When we compare (5) with (18) and (20), the computational complexity of the distributed algorithm is O(E3)O(E^{3}), whereas that of the centralized algorithm is O((mn)3)O((mn)^{3}) due to the matrix inversion [12]. Therefore, the distributed algorithm is more scalable since EmnE\ll mn in general.

III-C Online Information Fusion by Moving Agents

If the (m+1)(m+1)-th new training dataset {(𝐱¯m+1i,y¯m+1i)}i=1n\{(\mathbf{\bar{x}}_{m+1}^{i},\bar{y}_{m+1}^{i})\}_{i=1}^{n} are obtained, {αmi}i=1n\{\alpha_{m}^{i}\}_{i=1}^{n} and {βmi}i=1n\{\beta_{m}^{i}\}_{i=1}^{n} have to be discarded to include new data so that the consensus process must be restarted from scratch. To avoid repeated restarts and keep the continuity of environmental estimate, we introduce the online information fusion algorithm.

Let us assume that the sensor measurement frequencies of all agents are same for convenience. The update rule of αmi\alpha_{m}^{i} and βmi\beta_{m}^{i} is defined as follows:

{ceqn}
αm+1i=(1r)αmi+rΦ(𝐱¯m+1i)Φ(𝐱¯m+1i),βm+1i=(1r)βmi+rΦ(𝐱¯m+1i)ym+1i,\begin{array}[]{ r>{{}}l @{\ } l @{\ } r>{{}}l @{\ } l }\alpha_{m+1}^{i}&=&(1-r)\alpha_{m}^{i}+r\Phi(\mathbf{\bar{x}}_{m+1}^{i})\Phi^{\top}(\mathbf{\bar{x}}_{m+1}^{i}),\\ \beta_{m+1}^{i}&=&(1-r)\beta_{m}^{i}+r\Phi(\mathbf{\bar{x}}_{m+1}^{i})y_{m+1}^{i},\end{array} (21)

where α0i=0\alpha_{0}^{i}=0 and β0i=0\beta_{0}^{i}=0. This rule is an infinite impulse response (IIR) filter. If r=(m+1)1r=(m+1)^{-1}, The update rule reflects all dataset equally, so it is suitable for static environmental learning. If r>(m+1)1r>(m+1)^{-1}, this rule reflects more of the recent data, so it is suitable for dynamic environmental learning. Simulation results for each environmental learning are shown in Chapter V.

Theorem 1.

Using an average consensus protocol and update rules in (21), new data are successively fused with the existing {αmi}i=1n\{\alpha_{m}^{i}\}_{i=1}^{n} and {βmi}i=1n\{\beta_{m}^{i}\}_{i=1}^{n}, so that {αm+1i}i=1n\{\alpha_{m+1}^{i}\}_{i=1}^{n} and {βm+1i}i=1n\{\beta_{m+1}^{i}\}_{i=1}^{n} converge towards (17a) and (17b), respectively, in a distributed manner.

Proof.

See the Appendix in [14]. ∎

Refer to caption
Figure 3: Illustration of the distributed MCTS process. Nearby robots exchange their predicted trajectories. These trajectories are used to temporarily update GPs and grow search trees. This process is repeated until the time budget is met.

IV Path Planning: Distributed Monte Carlo Tree Search

Using the distributed model learning discussed in the previous section, all agents create a local environmental map that converges to the global environmental map even in a distributed network. To find the most promising search trajectories with the learned map, all agents should consider every possible action. However, because the cardinality of possible action set grows exponentially with respect to the number of robots, the distributed planning strategy is needed in multi-robot path planning [33]. In [26], the decentralized MCTS approach is studied to alleviate the cardinality of possible action set from |𝒜|n\lvert\mathcal{A}\rvert^{n} to |𝒜|\lvert\mathcal{A}\rvert, where 𝒜\mathcal{A} represents the discrete action space of each robot. We apply this advantage to our GP-based informative planning of multiple robots. With the distributed MCTS, each robot calculates a promising trajectory by communication with neighboring agents only. This process is shown in Fig. 3. This section introduces the trajectory merging method to reflect the neighbor’s path in each agent’s tree search process. The contents of this section are summarized in Algorithm 1 and 2.

IV-A Trajectory merging

For each agent ii, Xk+1:k+Ti=(𝐱^k+1i,,𝐱^k+Ti)X_{k+1:k+T}^{i}=(\mathbf{\hat{x}}_{k+1}^{i},\cdots,\mathbf{\hat{x}}_{k+T}^{i}) denotes the predicted trajectory with the prediction length TT, or it can be represented by XTiX_{T}^{i} for brevity. Assuming that the agent ii receives the predicted trajectories of neighboring agents XTi={XTj}j𝒩iX_{T}^{i-}=\{{X_{T}^{j}}\}_{j\in\mathcal{N}_{i}}, we modify the GP sate αmi\alpha_{m}^{i} as follows: {ceqn}

α^mi(XTi)=mnmn+n(XTi)αmi+n(XTi)mn+n(XTi)j𝒩it=1TΦ(𝐱^k+tj)Φ(𝐱^k+tj).\begin{array}[]{ r>{{}}l @{\ } l @{\ } r>{{}}l @{\ } l }\hat{\alpha}_{m}^{i}&(X_{T}^{i-})=\dfrac{mn}{mn+\mathrm{n}(X_{T}^{i-})}\alpha_{m}^{i}\\ +&\dfrac{\mathrm{n}(X_{T}^{i-})}{mn+\mathrm{n}(X_{T}^{i-})}\sum\limits_{j\in\mathcal{N}_{i}}\sum\limits_{t=1}^{T}\Phi(\mathbf{\hat{x}}_{k+t}^{j})\Phi^{\top}(\mathbf{\hat{x}}_{k+t}^{j}).\end{array} (22)

n(XTi)\mathrm{n}(X_{T}^{i-}) is the number of sensing points included in XTiX_{T}^{i-}. With (22) and the EE-dimensional estimator in (20), we define the trajectory-merged GP estimator as follows:

{ceqn}
Σ^Ei(𝐱):=κ(𝐱,𝐱)Φ(𝐱)(α^mi+σv2mn+n(XTi)ΛE1)1×α^miΛEΦ(𝐱).\begin{array}[]{ r>{{}}l @{\ } l @{\ } r>{{}}l @{\ } l }\hat{\Sigma}_{E}^{i}(\mathbf{x}):&=&\kappa(\mathbf{x},\mathbf{x})\\ &&-\Phi^{\top}(\mathbf{x})\left(\hat{\alpha}_{m}^{i}+\dfrac{\sigma_{v}^{2}}{mn+\mathrm{n}(X_{T}^{i-})}\Lambda_{E}^{-1}\right)^{-1}\\ &&\times\hat{\alpha}_{m}^{i}\Lambda_{E}\Phi(\mathbf{x}).\end{array} (23)

In this way, predicted trajectories of neighboring agents are temporarily included in the acquired data set of the GP estimator. Because (22) and (23) are temporary values for tree search in distributed MCTS, they do not affect αmi\alpha_{m}^{i} and disappear after getting new predicted trajectories. This process is summarized in the Distributed MCTS block of Fig. 2. With this result, the path planning process will be explained in the next section.

IV-B Informational reward function

As we mentioned in (7), the information gain 𝕀\mathbb{I} is the objective function we have to maximize. With the definition in (8), the optimal trajectory considering neighboring paths is defined as follows.

{ceqn}
XTi=argmaxXTi𝕏kiJ(XTiXTiX1:k)=argmaxXTi𝕏ki𝕀(𝐲Ti𝐲Ti𝐲1:k;f),\begin{array}[]{ r>{{}}l @{\ } l @{\ } r>{{}}l @{\ } l }X_{T}^{i*}&=&\operatorname*{arg\,max}\limits_{X_{T}^{i}\in\mathbb{X}_{k}^{i}}J(X_{T}^{i}\cup X_{T}^{i-}\cup X_{1:k})\\ &=&\operatorname*{arg\,max}\limits_{X_{T}^{i}\in\mathbb{X}_{k}^{i}}\mathbb{I}(\mathbf{y}_{T}^{i}\cup\mathbf{y}_{T}^{i-}\cup\mathbf{y}_{1:k};f),\end{array} (24)

𝐲Ti\mathbf{y}_{T}^{i} and 𝐲Ti\mathbf{y}_{T}^{i-} are the measurements corresponding to XTiX_{T}^{i} and XTiX_{T}^{i-}, respectively. 𝕏ki\mathbb{X}_{k}^{i} is the domain of possible trajectories for agent ii. As shown in (6), information gain is represented with entropies as follows:

{ceqn}
𝕀(𝐲Ti𝐲Ti𝐲1:k;f)=H(𝐲Ti𝐲Ti𝐲1:k)H(𝐲Ti𝐲Ti𝐲1:k|f).\begin{array}[]{ r>{{}}l @{\ } l @{\ } r>{{}}l @{\ } l }\mathbb{I}(\mathbf{y}_{T}^{i}\cup\mathbf{y}_{T}^{i-}\cup\mathbf{y}_{1:k};f)&=&\textrm{H}(\mathbf{y}_{T}^{i}\cup\mathbf{y}_{T}^{i-}\cup\mathbf{y}_{1:k})\\ &&-\textrm{H}(\mathbf{y}_{T}^{i}\cup\mathbf{y}_{T}^{i-}\cup\mathbf{y}_{1:k}|f).\\ \end{array} (25)

Using the mesuarement model (1) and the entropy calculation for the normal distribution [32], conditional entropy becomes

{ceqn}
H(𝐲Ti𝐲Ti𝐲1:k|f)=12log|2πeσv2I|.\begin{array}[]{ r>{{}}l @{\ } l @{\ } r>{{}}l @{\ } l }\textrm{H}(\mathbf{y}_{T}^{i}\cup\mathbf{y}_{T}^{i-}\cup\mathbf{y}_{1:k}|f)=\frac{1}{2}\log{|2\pi e\sigma_{v}^{2}I|}.\end{array} (26)

H(𝐲Ti𝐲Ti𝐲1:k)\textrm{H}(\mathbf{y}_{T}^{i}\cup\mathbf{y}_{T}^{i-}\cup\mathbf{y}_{1:k}) is decomposed using conditional entropy, and it is obtained by calculating the entropy of GP as follows:

{ceqn}
H(𝐲Ti𝐲Ti𝐲1:k)=H(𝐲Ti𝐲1:k)+H(𝐲Ti|𝐲Ti𝐲1:k)H(𝐲Ti𝐲1:k)+12log|2πeΣ^Ei(XTi)|.\begin{array}[]{ r>{{}}l @{\ } l @{\ } r>{{}}l @{\ } l }\textrm{H}(\mathbf{y}_{T}^{i}\cup\mathbf{y}_{T}^{i-}\cup\mathbf{y}_{1:k})&=&\textrm{H}(\mathbf{y}_{T}^{i-}\cup\mathbf{y}_{1:k})+\textrm{H}(\mathbf{y}_{T}^{i}|\mathbf{y}_{T}^{i-}\cup\mathbf{y}_{1:k})\\ &\approx&\textrm{H}(\mathbf{y}_{T}^{i-}\cup\mathbf{y}_{1:k})+\frac{1}{2}\log{|2\pi e\hat{\Sigma}_{E}^{i}(X_{T}^{i})|}.\\ \end{array} (27)

As a result, with the trajectory-merged GP estimator in (23), the optimal trajectory for agent ii is defined as follows:

{ceqn}
XTi=argmaxXTi𝕏kiJ(XTiXTiX1:k)argmaxXTi𝕏ki12log|2πeΣ^Ei(XTi)|=argmaxXTi𝕏kii(XTi).\begin{array}[]{ r>{{}}l @{\ } l @{\ } r>{{}}l @{\ } l }X_{T}^{i*}&=&\operatorname*{arg\,max}\limits_{X_{T}^{i}\in\mathbb{X}_{k}^{i}}J(X_{T}^{i}\cup X_{T}^{i-}\cup X_{1:k})\\ &\approx&\operatorname*{arg\,max}\limits_{X_{T}^{i}\in\mathbb{X}_{k}^{i}}\frac{1}{2}\log{|2\pi e\hat{\Sigma}_{E}^{i}(X_{T}^{i})|}\\ &=&\operatorname*{arg\,max}\limits_{X_{T}^{i}\in\mathbb{X}_{k}^{i}}\mathcal{R}^{i}(X_{T}^{i}).\\ \end{array} (28)

We call i()\mathcal{R}^{i}(\cdot) the informational reward function, which is utilized in the tree search algorithm.

Refer to caption
Figure 4: Node closing method for obstacle avoidance in MCTS. (left) finite action space that the robot can take. (right) tree expansion and node closing process.

IV-C Tree Search with D-UCB Alogrithm

Using the informational reward function defined in IV-B, the tree search algorithm iteratively explores and evaluates predictive path candidates according to the discounted upper confidence bound (D-UCB) rule to find the optimal path. D-UCB rule assigns the probabilistic search priority to the action candidates.

The tree structure consists of nodes ss and edges (s,a)(s,a) for all legal actions a𝒜(s)a\in\mathcal{A}(s). Each edge contains a set of variables {Nsa,Wsa,τsa,Csa}\{N_{s}^{a},W_{s}^{a},\tau_{s}^{a},C_{s}^{a}\} where NsaN_{s}^{a} is the visit count, WsaW_{s}^{a} is the total action value, τsa\tau_{s}^{a} is the number of iterations for tree search (shown in line 5 of Algorithm 2), and CsaC_{s}^{a} is a closing variable which will be discussed. We follow the tree search process in Algorithm 1 and the distributed MCTS with GP in Algorithm 2. The MCTS process can be divided into four main steps as follows.

IV-C1 Selection

(lines 4, 12-22 of Algorithm 1) The selection phase focuses on finding a leaf node sleafs_{\textit{leaf}}. Following the D-UCB rule, the selected action at node ss is defined as follows [32]:

{ceqn}
at=argmaxa𝒜(s)(WsaγττsaNsaγττsa+Usa),\begin{array}[]{ r>{{}}l @{\ } l @{\ } r>{{}}l @{\ } l }a_{t}=\operatorname*{arg\,max}\limits_{a\in\mathcal{A}(s)}\left(\dfrac{W_{s}^{a}\gamma^{\tau-\tau_{s}^{a}}}{N_{s}^{a}\gamma^{\tau-\tau_{s}^{a}}}+U_{s}^{a}\right),\end{array} (29)

where {ceqn}

Usa=lna𝒜(s)NsaγττsaNsaγττsa.\begin{array}[]{ r>{{}}l @{\ } l @{\ } r>{{}}l @{\ } l }U_{s}^{a}=\sqrt{\dfrac{\ln\sum_{a^{\prime}\in\mathcal{A}(s)}{N_{s}^{a^{\prime}}\gamma^{\tau-\tau_{s}^{a^{\prime}}}}}{N_{s}^{a}\gamma^{\tau-\tau_{s}^{a}}}}.\end{array} (30)

The first term on the right-hand side of (29) means exploration term for the tree search, and the second term means exploitation term. As shown in Algorithm 2, each agent periodically receives the predicted trajectories of adjacent agents, which are utilized in the tree search process. It means that the tree, obtained by using previously given trajectories, may not be optimal when new neighboring trajectories are received. Therefore, adopting the discount factor γ\gamma makes the previously visited nodes less influential on the current UCB value.

If the current node has no selectable actions because of path blockage, the node closes (Csa1C_{s}^{a}\leftarrow 1) and the algorithm returns to the parent node to restart the selection process. We call this process as node closing method illustrated in Fig. 4 and lines 15-21 of Algorithm 1.

1 Function TreeSearch(Tki,τ)(T_{k}^{i},\tau):
2 s0s_{0}\leftarrow getRootNode(Tki)(T_{k}^{i})
3 for t1t\leftarrow 1\ to NiterationN_{\text{iteration}} do
4    sleafs_{\textit{leaf}}\leftarrow selection(s0,τ)(s_{0},\tau)
5    (st,a)(s_{t},a)\leftarrow expansion(sleaf,τ)(s_{\textit{leaf}},\tau)
6    if collisionCheck(st)(s_{t}) then
7       CsleafaC_{s_{\textit{leaf}}}^{a}\leftarrow true
8       continue
9       
10    rtr_{t}\leftarrow simulation(st)(s_{t})
11    backprop(st,rt)(s_{t},r_{t})
12    
13 return TkiT_{k}^{i}
14 
15
16Function selection(sleaf,τ)(s_{\textit{leaf}},\tau):
17 while not leafNodeFound do
18    if not depth(sleaf)>Tdepth(s_{leaf})>T then
19       if Csleafa=truea𝒜(sleaf)C_{s_{\textit{leaf}}}^{a}=\textit{true}\ \forall{a\in\mathcal{A}(s_{\textit{leaf}})} then
20          Csleaf1aleaf1trueC_{s_{\textit{leaf}-1}}^{a_{\textit{leaf}-1}}\leftarrow\textit{true}
21            back to the parent node
22          sleafsleaf1s_{\textit{leaf}}\leftarrow s_{\textit{leaf}-1}
23          
24       else
25          aa\leftarrowD-UCB(sleaf,τs_{\textit{leaf}},\tau)\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\triangleright\ eq. (29)
26          sleafgetNode(sleaf,a)s_{\textit{leaf}}\leftarrow getNode(s_{\textit{leaf}},a)
27       
28    
29 return sleafs_{\textit{leaf}}
30 
31
32Function simulation(st)(s_{t}):
33 XTiX_{T}^{i}\leftarrow trajectory from s0s_{0} to sts_{t}
34 for jj\leftarrow\ depth(st)(s_{t})\ to TT do
35    XTi{XTi,randomWalk(𝐱k+ji)}X_{T}^{i}\leftarrow\{X_{T}^{i},\textit{randomWalk}(\mathbf{x}_{k+j}^{i})\}
36    
37 return i(XTi)\mathcal{R}^{i}(X_{T}^{i})
38 
Algorithm 1 Tree Search with Obstacle Avoidance
\ULforem

IV-C2 Expansion

(line 5 of Algorithm 1) The expansion phase expands the selected node with uniformly sampled action from the action space 𝒜(s)\mathcal{A}(s) if the depth of the selected node does not exceed the search depth TT. When the expanded node collides with an obstacle, the algorithm closes this edge (Csa1C_{s}^{a}\leftarrow 1) and returns to the selection phase.

IV-C3 Simulation

(lines 9, 23-27 of Algorithm 1) In the simulation phase, it calculates the informational reward i(XTi)\mathcal{R}_{i}(X_{T}^{i}) of the selected trajectory. If the selected node’s depth is less than the search depth TT, it performs random walks. After that, the reward is calculated with the predicted trajectory XTiX_{T}^{i} as shown in Section IV-B.

IV-C4 Backpropagation

(line 10 of Algorithm 1) The edge variables are updated in a backward pass. The visit counts are incremented, NsaNsaγττsa+1N_{s}^{a}\leftarrow N_{s}^{a}\gamma^{\tau-\tau_{s}^{a}}+1, and the total action value is updated, WsaWsaγττsa+RiW_{s}^{a}\leftarrow W_{s}^{a}\gamma^{\tau-\tau_{s}^{a}}+R_{i}. As described in the selection step, the discount factor γ\gamma is applied to reduce the weight of the previous value.

Input: 𝐱ki,αm1i,βm1i,XTi\mathbf{x}^{i}_{k},\alpha^{i}_{m-1},\beta^{i}_{m-1},X_{T}^{i-}
Output: XTiX_{T}^{i}
1 ykiy^{i}_{k}\leftarrowgetMeasurement(𝐱ki)(\mathbf{x}^{i}_{k})\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\triangleright\ eq. (1)
2 (αmi,βmi)(\alpha^{i}_{m},\beta^{i}_{m})\leftarrowupdateGP(αm1i,βm1i,𝐱ki,yki)(\alpha^{i}_{m-1},\beta^{i}_{m-1},\mathbf{x}^{i}_{k},y^{i}_{k})\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\triangleright\ eq. (21)
3 (αmi,βmi)(\alpha_{m}^{i},\beta_{m}^{i})\leftarrowGPconsensus(αmi,βmi)(\alpha_{m}^{i},\beta_{m}^{i})\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\triangleright\ eq. (18) in [14]
4 TkiT_{k}^{i}\leftarrowinitializeTree(𝐱ki)(\mathbf{x}_{k}^{i})
5
6for τ1\tau\leftarrow 1\ to NsearchN_{\text{search}} do
7 (α^mi,β^mi)(\hat{\alpha}_{m}^{i},\hat{\beta}_{m}^{i})\leftarrowTrajectoryMerging(XTi)(X_{T}^{i-})\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\triangleright\ eq. (22)
8 TkiT_{k}^{i}\leftarrow TreeSearch(Tki,α^mi,β^mi,τ)(T_{k}^{i},\hat{\alpha}_{m}^{i},\hat{\beta}_{m}^{i},\tau)\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\hfill\triangleright\ Algorithm 1
9 XTiX_{T}^{i}\leftarrow getBestTraj(Tki)(T_{k}^{i})
10 XTiX_{T}^{i-}\leftarrow communicateTraj(XTi)(X_{T}^{i})
11 
12return XTiX_{T}^{i}
Algorithm 2 Distributed MCTS with GP for agent ii
\ULforem

V Simulation Result

This section presents environmental learning simulations on the various situations. The first simulation is on a time-invariant synthetic environment, and the second is on a dynamic environment based on the real-world meteorological dataset. These environmental models are unknown a priori, and each robot obtains the sensory data from the current location. Furthermore, since the communication range is finite, some agents may not be able to communicate with each other.

Refer to caption
Figure 5: Simulation 1-A. The process of the environmental model construction by 12 agents with fully distributed informative planning. (a) Change of GP estimate and (b) uncertainty propagation over time from 0 to 5050 seconds in order from the left figure. (c) Ground truth of environmental model. Yellow lines in (b) indicate communication links between agents. All presented results are obtained by agent #1\#1 (red dot in (b)). (d) Environmental model estimation error. The solid green line shows the box plot of RMSE values between all agents and the ground truth. The dashed red line shows the box plot of RMSE values between all agents and the agent #1\#1. This graph means that all GP estimation results converge to the same result with only local communication.
Refer to caption
(a) Search depths
Refer to caption
(b) Number of iterations
Refer to caption
(c) Number of agents
Figure 6: Simulation 1-B. Environmental model construction with different conditions. (a) 6 agents’ exploration results with different search depths. (b) 8 agents’ exploration results with the different number of iterations. (c) Exploration results with the different number of agents.

V-A Simulation 1 - synthetic environment learning

We perform the fully distributed informative planning simulation for multiple agents. They conduct exploration to obtain an estimate of the environmental map, considering collision avoidance and coordination. They can communicate only with neighbors within a range of 10 m (the map size is 2020 m ×\times 2020 m) and move at 11 m/s constantly. We set σs2=1\sigma_{s}^{2}=1 and Σl=diag([0.02,0.02])\Sigma_{l}=\text{diag}([0.02,0.02]) for the Gaussian kernel (3), and we set E=80E=80 for EE-dimensional estimator (15) and (19).

The progress over time from 0 to 5050 seconds is shown in Fig. 5. As shown in Figs. 5(a)-(b), twelve agents search the map together and generate the GP estimate presenting the ground truth model in 5(c). The agents scatter naturally and find the next locations to be updated based on the variance map. Also, as they avoid the places where the estimate is already reliable, they can minimize the redundant actions that can decrease the exploration efficiency. Through Fig. 5(b), it can be confirmed that the information of all agents is diffused through a communication link.

Although Figs. 5(a)-(b) show results from agent #1\#1 only, all the distributed GP estimates of each agent converge to the same by the average consensus as shown in Fig. 5(d). In other words, all agents do not simply use local information only in the exploration process but construct a global GP estimation map in a distributed manner.

Refer to caption
Figure 7: Simulation 1-C. 4 agents’ exploration results with different type of systems for 10 trials (with different environments).
Refer to caption
Figure 8: Simulation 2. The process of the real-world heat map reconstruction performed by 6 moving UAVs with fully distributed informative planning. (a) Actual change of the heat map and (b) Environmental learning result from 0 to 55 hours in order from the left figure. (c) Initial locations and trajectories of UAVs.

We conduct more simulations in various environments to investigate the factors that affect search performance. In the tree search algorithm, the search depth TT and the number of iterations NiterationsN_{iterations} are the factors that directly affect the search result. The simulation results in Figs. 6(a)-(b) show that the search performance is proportional to both TT and NiterationsN_{iterations}. The deeper the search, the more distant paths are considered. As the number of searching iterations increases, the probability of finding an optimal route increases. Fig. 6(c) shows that the search performance can be improved as the number of agents increases through a distributed algorithm. From these results, we can see that our algorithm is scalable for a large number of robots as well.

Fig. 7 compares the search performance of distributed systems, centralized systems, and independent systems. In the independent system, agents explore the area without communication. The centralized system has a central server that gathers all the information regardless of the communication range, and the server calculates paths for all agents. Because the action space of centralized system (n(𝒜)n)(\mathrm{n}(\mathcal{A})^{n}) is much bigger than that of distributed system (n(𝒜))(\mathrm{n}(\mathcal{A})), we set about 22 times more NiterationsN_{iterations} for the centralized system than the distributed system. Even with limited communication and much fewer iterations, the distributed system performs similarly to the centralized system.

V-B Simulation 2 - real-world dataset environmental learning

This section presents simulation result for the exploration in a dynamic environment (Fig. 1), using a real-world meteorological dataset. The simulation uses temperature data collected from weather stations in Seoul, South Korea [34]. The reason we choose the weather data for Seoul is that weather stations are densely distributed (the area of Seoul is 605.25km2605.25\ \text{km}^{2}). We create a heat map for a ground truth based on the data measured from 0 am to 5 am on July 16th, 2020.

In this scenario, a team of UAVs flies over the search area and gathers temperature data from the current UAV’s location. They fly at a constant velocity of 20 km/h. Because their communication distance is limited to 20 km, sometimes they can be disconnected from one another.

Some snapshots taken during the simulation are shown in Fig. 8. The ground truth over time is shown in Fig. 8(a), and the GP estimation is shown in Fig. 8(b). After 1.7 hours, the estimation result is similar to the ground truth. After that, the results track the true value continuously even when the actual environment changes.

VI Conclusions

This paper presents fully distributed robotic sensor networks to obtain a global environmental model estimate. We combine the Gaussian process with the Monte Carlo tree search in a distributed manner for peer-to-peer communication. Our method allows multiple robots to collaboratively perform exploration, taking into account collision avoidance and coordination. We validate our algorithm in various environments, including a time-varying temperature monitoring task using a real-world dataset. The results confirm that multiple agents can successfully explore the environment, and it is scalable with the increasing number of agents in the distributed network.

References

  • [1] F. Deng, S. Guan, X. Yue, X. Gu, and J. Chen, “Energy-Based Sound Source Localization with Low Power Consumption in Wireless Sensor Networks,” in IEEE Trans. Industrial Electronics, 2017.
  • [2] C. Matthew, W.-H. Chen, and C. Liu, “Boustrophedon Coverage Path Planning for UAV Aerial Surveys in Wind,” in Proc. Int. Conf. Unman. Air. Sys., 2017.
  • [3] Q. Feng, H. Cai, Y. Yang, J. Xu, M. Jiang, F. Li, X. Li, and C. Yan, “An Experimental and Numerical Study on a Multi-Robot Source Localization Method Independent of Airflow Information in Dynamic Indoor Environments,” Sust. Cit. Soc., 2020.
  • [4] J. Patrikar, B. G. Moon, and S. Scherer, “Wind and the City: Utilizing UAV-Based In-Situ Measurements for Estimating Urban Wind Fields,” in Proc. IEEE/RSJ Int. Conf. on Int. Robot. Sys., 2020.
  • [5] K.-C. Ma, L. Liu, and G. S. Sukhatme, “An Information-driven and Disturbance-Aware Planning Method for Long-Term Ocean Monitoring,” in Proc. IEEE/RSJ Int. Conf. on Int. Robot. Sys., 2016.
  • [6] J. Zhang, R. Liu, K. Yin, Z. Wang, M. Gui, and S. Chen,“Intelligent Collaborative Localization Among Air-Ground Robots for Industrial Environment Perception,” in IEEE Trans. Industrial Electronics, 2019.
  • [7] A. Q. Li, P. K. Penumarthi, J. Banfi, N. Basilico, J. M. O’Kane, I. Rekleitis, S. Nelakuditi, and F. Amigoni, “Multi-Robot Online Sensing Strategies for the Construction of Communication Maps,” Auton. Robots, 2020.
  • [8] Y. Shi, N. Wang, J. Zheng, Y. Zhang, S. Yi, W. Luo, and K. Sycara, “Adaptive Informative Sampling with Environment Partitioning for Heterogeneous Multi-Robot Systems,” in Proc. IEEE/RSJ Int. Conf. on Int. Robot. Sys., 2020.
  • [9] M. L. Elwin, R. A. Freeman, and K. M. Lynch, “Distributed Environmental Monitoring With Finite Element Robots,” IEEE Trans. Robot., 2020.
  • [10] A. Viseras, T. Wiedemann, C. Manss, L. Magel, J. Mueller, D. Shutin, and L. Merino, “Decentralized Multi-Agent Exploration with Online-Learning of Gaussian Processes,” in Proc. IEEE Int. Conf. Robot. Autom., 2016.
  • [11] B. Kartal, E. Nunes, J. Godoy, and M. Gini, “Monte carlo tree search for multi-robot task allocation,” in Proc. AAAI Conf. Art. Intell., 2016.
  • [12] G. Pillonetto, L. Schenato, and D. Varagnolo, “Distributed Multi-Agent Gaussian Regression via Finite-Dimensional Approximations,” IEEE Trans. Pattern Anal. Mach. Intell., 2019.
  • [13] B.C. Levy, “Karhunen Loève Expansion of Gaussian Processes,” Principles of Signal Detection and Parameter Estimation, Springer, 2008.
  • [14] D. Jang, J. Yoo, C. Y. Son, D. Kim, and H. J. Kim, “Multi-Robot Active Sensing and Environmental Model Learning With Distributed Gaussian Proces,” IEEE Robot. Autom. Lett., 2020.
  • [15] A. Krause, A. Singh, and C. Guestrin, “Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies,” J. Mach. Learn. Res., 2008.
  • [16] W. Luo, C. Nam, G. Kantor, and K. Sycara, “Distributed Environmental Modeling and Adaptive Sampling for Multi-Robot Sensor Coverage,” in Proc. Int. Conf. Auton. Agents Multiagent Sys., 2019.
  • [17] D. Gu, “Distributed EM Algorithm for Gaussian Mixtures in Sensor Networks,” IEEE Trans. Neural Netw., 2008.
  • [18] G. Flaspohler, V. Preston, A. P. M. Michel, Y. Girdhar, and N. Roy, “Information-Guided Robotic Maximum Seek-and-Sample in Partially Observable Continuous Environments.” IEEE Robot. Autom. Lett., 2019.
  • [19] P. R. Silveria, D. F. Naiff, C. M.N.A. Pereira, and R. Schirru, “Reconstruction of Radiation Dose Rate Profiles by Autonomous Robot with Active Learning and Gaussian Process Regression,” Ann. Nuc. Energy, 2018.
  • [20] K. C. Ma, L. Liu, and G. S. Sukhatme, “Informative Planning and Online Learning with Sparse Gaussian Processes,” in Proc. IEEE Int. Conf. Robot. Autom., 2017.
  • [21] A. Meliou, A. Krause, C. Guestrin, and J. M. Hellerstein, “Nonmyopic Informative Path Planning in Spatio-Temporal Models,” Assoc. Adv. Art. Intell., 2007.
  • [22] L. Bottarelli, M. Bicego, J. Blum, and A. Farinelli, “Orienteering-Based Informative Path Planning for Environmental Monitoring,” Eng. App. Art. Intell., 2019.
  • [23] K. Yang, S. K. Gan, and S. Sukkarieh, “A Gaussian Process Based RRT Planner for the Exploration of an Unknown and Cluttered Environment with a UAV,” Advanced Robotics, 2013.
  • [24] W. Chen, and L. Liu, “Pareto Monte Carlo Tree Search for Multi-Objective Informative Planning,” in Proc. Robot.: Sci. Sys., 2019.
  • [25] J. Choi, S. Oh, and R. Horowitz, “Distributed Learning and Cooperative Control for Multi-Agent Systems,” Automatica, 2009.
  • [26] G. Best, O. M Cliff, T. Patten, R. R Mettu, and R. Fitch, “Dec-MCTS: Decentralized Planning for Multi-Robot Active Perception,” Int. J. Robot. Res., 2019.
  • [27] C. E. Rasmussen, “Gaussian Processes in Machine Learning,” Advanced Lectures on Machine Learning, Springer, 2003.
  • [28] H. Zhu, C. K. I. Williams, R. Rohwer, and M. Morciniec, “Gaussian Regression and Optimal Finite Dimensional Linear Models,” Tech. Rep., Aston University, 1997.
  • [29] R. O. Saber, and R. M. Murray, “Consensus Protocols for Networks of Dynamic Agents,” in Proc. Amer. Cont. Conf., 2003.
  • [30] D. H. Wolpert, S. R. Bieniawski, and D. G. Rajnarayan, “Probability Collectives in Optimization,” Handbook of Statistics 31, 2013.
  • [31] A. Garivier, and E. Moulines, “On Upper-Confidence Bound Policies for Switching Bandit Problems,” in Proc. Int. Conf. Alg. Learning Theory, 2011.
  • [32] N. Srinivas, A. Krause, S. M. Kakade, and M. W. Seeger, “Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting,” IEEE Trans. Info. Theory, 2012.
  • [33] B. Du, K. Qian, C. Claudel, and D. Sun, “Parallelized Active Information Gathering Using Multisensor Network for Environment Monitoring,” IEEE Trans. Cont. Sys. Tech., 2021.
  • [34] https://data.kma.go.kr/
[Uncaptioned image] Dohyun Jang received the B.S. degree in Electrical Engineering from Korea University in 2017, and the M.S. degree in Mechanical and Aerospace Engineering from Seoul National University, Seoul, in 2019. He is currently a Ph.D. Candidate in the School of Aerospace Engineering, SNU. His research interests include distributed systems, networked systems, machine learning, and robotics.
[Uncaptioned image] Jaehyun Yoo received the Ph.D. degree in the School of Mechanical and Aerospace Engineering, Seoul National University, Seoul, in 2016. He was a postdoctoral researcher at the School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden. He is currently a Professor at the School of AI, Sungshin Women’s University. His research interests include machine learning, indoor localization, automatic control, and robotic systems.
[Uncaptioned image] Clark Youngdong Son Clark Youngdong Son received the B.S. degree in Mechanical Engineering from Sungkyunkwan University, and the Ph.D. degree in Mechanical and Aerospace Engineering from Seoul National University, in 2015 and 2021, respectively. He is currently a Staff Engineer at Mechatronics R&D Center, Samsung Electronics. His research interests include robotics, path planning, and optimal control.
[Uncaptioned image] H. Jin Kim received the B.S. degree from Korea Advanced Institute of Technology (KAIST) in 1995, and the M.S. and Ph.D. degrees in Mechanical Engineering from University of California, Berkeley, in 1999 and 2001, respectively. From 2002 to 2004, she was a Postdoctoral Researcher in Electrical Engineering and Computer Science, UC Berkeley. In 2004, she joined the Department of Mechanical and Aerospace Engineering at Seoul National University as an Assistant Professor, where she is currently a Professor. Her research interests include intelligent control of robotic systems and motion planning.