This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Bayesian BIM-Guided Construction Robot Navigation with NLP Safety Prompts in Dynamic Environments

Mani Amani1,2 and Reza Akhavian1
Abstract

Construction robotics increasingly relies on natural language processing for task execution, creating a need for robust methods to interpret commands in complex, dynamic environments. While existing research primarily focuses on what tasks robots should perform, less attention has been paid to how these tasks should be executed safely and efficiently. This paper presents a novel probabilistic framework that uses sentiment analysis from natural language commands to dynamically adjust robot navigation policies in construction environments. The framework leverages Building Information Modeling (BIM) data and natural language prompts to create adaptive navigation strategies that account for varying levels of environmental risk and uncertainty. We introduce an object-aware path planning approach that combines exponential potential fields with a grid-based representation of the environment, where the potential fields are dynamically adjusted based on the semantic analysis of user prompts. The framework employs Bayesian inference to consolidate multiple information sources: the static data from BIM, the semantic content of natural language commands, and the implied safety constraints from user prompts. We demonstrate our approach through experiments comparing three scenarios: baseline shortest-path planning, safety-oriented navigation, and risk-aware routing. Results show that our method successfully adapts path planning based on natural language sentiment, achieving a 50% improvement in minimum distance to obstacles when safety is prioritized, while maintaining reasonable path lengths. Scenarios with contrasting prompts, such as ?dangerous? and ?safe,? demonstrate the framework’s ability to modify paths based on. This approach provides a flexible foundation for integrating human knowledge and safety considerations into construction robot navigation.

Keywords -

Construction Robotics, Natural Language Processing, Bayesian Inference, Robotic Path Planning, Building Information Modeling (BIM), Sentiment Analysis, Exponential Potential Fields, Dynamic Environments, Robot Safety

Refer to caption
Figure 1: Semantic Reasoning using BIM and Natural Prompts to Update Robotic Tasks

1 Introduction

Construction jobsites present unique challenges for autonomous robot navigation due to their inherently dynamic and cluttered nature. Unlike controlled manufacturing environments, construction sites undergo constant changes as work progresses, with temporary structures, moving equipment, materials, and workers creating a complex and evolving workspace. While BIM provides valuable spatial and semantic information for robot planning, its static nature may not fully capture the dynamic reality of construction environments [1]. The discrepancy between BIM data and actual site conditions, combined with the presence of temporary obstacles and moving objects not represented in the BIM, creates significant challenges for safe and efficient robot navigation [2]. These challenges are particularly acute in renovation projects where existing conditions may differ from available BIM data [3]. To address this gap, there is a growing need for adaptive path-planning algorithms that can incorporate both the static spatial context from BIM and dynamic real-time information. Furthermore, integrating probabilistic methods, such as Bayesian inference, offers a pathway to manage uncertainty and variability through unsafe event prediction [4].

Robot planning with natural language processing (NLP) has received significant attention in recent years [5]. The construction industry presents a unique opportunity to use BIM, digital twins, and the rich textual and contextual information they contain to leverage the advancements in robot planning with NLP [6]. Some of the recent promising studies look for predefined mission keywords and commands to execute the intended task [7]. Other work has shown that using LLMs to identify object functions can result in misinterpretations and mission failure [8]. The embedded information in BIM includes a rich spatiotemporal and textual database regarding the mission environment, making them attractive candidates to incorporate into mission planning [9]. Previous works have leveraged schedules, pose estimation, and path planning information from BIM for robotic applications [10]. However, the fidelity of the information representation is crucial to ensure robust real-world implementations, as there might be inaccuracies in the model data. Such errors can cause robot malfunction and poor mission planning for autonomous agents. Therefore, real-time simulation and representation of a 3D map and accurate BIM localization is an active research topic in the construction research field [11].

Moreover, the current research on language-commanded construction robotics is limited to predefined objects and tasks. Both learning-based methods and hard-coded approaches can cause significant compatibility problems in the long run due to different nomenclatures from project A to project B and from one BIM Execution Plan to another. Furthermore, these approaches do not consider the prompter’s authority, knowledge, and reliability. In many cases, the prompter is assumed to be completely reliable and have full authority over the robot’s decision-making. However, in practical applications, it is important to distinguish the rank and the confidence level of the given prompt. Given that most neural networks are trained on large datasets, unseen commands can cause inaccuracies and misunderstandings that can harm robot operations [12]. It is important to identify the confidence and relevance level of prompts to the robot to ensure safe mission planning. To advance upon these shortcomings, we propose a probabilistic framework that uses sentiment analysis from commands to adjust the behavior of the mission given an end goal. Figure 1 depicts the overall idea of the paper.

Bayesian inference has been one of the cornerstones of probability theory. From economics to healthcare, many fields have used Bayesian inference to make predictive decisions from prior and observed data [13], and it has been used extensively in construction planning and decision-making [4, 14]. We propose to use Bayesian updates to consolidate current danger evaluations of the environment with the sentiment received from the prompt for more accurate and dynamic robotic pathfinding which is imperative in cluttered environments such as construction jobsites.

2 Problem Setup

We introduce a framework to extract arbitrary sentiments from user prompts within the parameters of BIM families. While this sentiment analysis method is generalizable to any problem that could use language consolidation, we demonstrate its efficacy in solving robotic pathfinding. The initial iteration of the NLP method uses the family names within the BIM to scale repellent coefficients associated with this object. This coefficient determines the ?danger level?, and can represent dynamism, localization error, cost, and any other heuristic the user chooses to use to define which family of objects should be repulsive for the robot to avoid getting close to in the context of path planning. The next step is to incorporate this information as a natural language command. The received command is then integrated by a transformation into numerical values of the sentiment of the user about the danger of the environment. For example, if the prompt mentions that the environment is dangerous (i.e., be careful when you go into room A) or conversely, the environment is safe (i.e., room A should be empty, go quickly), we can map these into a danger coefficient ranging from 0-1. These coefficients will then be used to scale any chosen heuristic that would alter this behavior. We expand on this more in the following sections.

3 Mathematical Framework

3.1 Exponential Potential Field (EPF)

The distance to the nearest obstacle is a widely used metric for object avoidance in robot navigation [15]. Common approaches such as signed distance fields and Euclidean distance fields are processes to represent space and occupancy using the closest distance of every obstacle relative to the current coordinates [16]. The distance field can be interpreted as a ?potential force? emitted from objects within the traversable areas. This approach was pioneered with the development of artificial potential fields (APF) [17]. APFs are a series of attractive and repulsive potentials that enable path planning for autonomous agents by traversing an environment using repulsive forces from obstacles and an attractive force from the goal. Usually, the repulsive function in these settings is formulated as an exponential function that exponentially increases in value as the agent gets closer to an obstacle, resulting in a stronger repulsive force. In this work, we employ a similar exponential function resulting in an ?Exponential potential field? or EPF. The reason we use an EPF instead of an APF is that we do not need an attractive potential since we use a graph traversal approach which does not need an attractive potential for navigation.

The motivation behind using EPFs is to account for slight variations between the reconciling of BIM and the real world. The exponential function creates a stronger force for areas closer to obstacles. Furthermore, we formulate a cumulative metric in the form of the summation of all potential values on each grid point using Equation 3 as opposed to the minimum distance done in traditional distance field approaches. This cumulation is more akin to APF for navigation. In complex and dense environments, the cumulative metric tends to result in higher potential values, which naturally encourages paths to avoid these areas whenever possible. We then discretize the map into a grid form to be able to use graph traversal methods for path planning. The calculation of the EPF is shown in Equation 1:

frep(x,y,)={krepeD(x,y,),D(x,y,)<Dmax0,otherwise.f_{\text{rep}}(x,y,\mathcal{M})=\begin{cases}k_{\text{rep}}e^{-D(x,y,\mathcal{M})},&\textstyle D(x,y,\mathcal{M})<D_{max}\\ 0,&\textstyle\text{otherwise.}\end{cases} (1)

where \mathcal{M} is the set of points (x,y)(x,y) that are the point of collision of the 3D object with the 2D plane and krepk_{\text{rep}} is a scalar that scales the potential value. Equation 1 will equal krepeD(x,y,)k_{\text{rep}}e^{-D(x,y,\mathcal{M})} when the distance (calculated using Equation 2) is lower than DmaxD_{max}, which is a threshold to reduce extremely small potential values to ensure numerical stability, and 0 otherwise.

D(x,y,)=min(x,y)(xx)2+(yy)2D(x,y,\mathcal{M})=\min_{(x^{\prime},y^{\prime})\in\mathcal{M}}\sqrt{(x-x^{\prime})^{2}+(y-y^{\prime})^{2}} (2)

Each grid point will have a cumulative potential value represented by Equation 3:

G(xi,yj)=k=1Ofrep(xi,yj,k),(xi,yj)𝒮G(x_{i},y_{j})=\sum_{k=1}^{O}f_{\text{rep}}(x_{i},y_{j},\mathcal{M}_{k}),\quad\forall(x_{i},y_{j})\in\mathcal{S} (3)

G(xi,yi)G(x_{i},y_{i}) is then used as a heuristic for our path-finding regimen.

3.2 Multi Heuristic A*

A* search algorithm is a graph traversal and pathfinding algorithm that is used in robotics and computer science [18]. The algorithm uses a set of weighted graphs to find the most optimal path given a heuristic. At its core, A* relies on the calculation of cost at each node given by Equation 4

f(n)=g(n)+h(n)f(n)=g(n)+h(n) (4)

where f(n)f(n) is the total cost of each node, g(n)g(n) is the cost of the start node to the current node, and h(n)h(n) is the heuristic of the current node.

Given an accurate and admissible heuristic, the algorithm is guaranteed to find the optimal path. Admissibility in the context of heuristics refers to a heuristic never overestimating the cost; this is usually determined by the triangle inequality. In other words, the admissibility of the heuristic guarantees A* will never overestimate the cost of each node and returns the optimal path [19]. However, for complex tasks, it is notoriously difficult and at times impossible to formulate a single admissible heuristic. [19]. This is the main motivation behind the creation of the multi-heuristic A* (MHA*). MHA* algorithmically handles multiple heuristics around one admissible anchor heuristic [19].

In this problem, we intend to use both distance and EPF as complementing heuristics for each node. However, the EPF can be potentially inadmissible. Therefore, we use one admissible heuristic in the form of Euclidean distance and one potentially inadmissible heuristic in terms of the EPF. While using inadmissible heuristics sacrifices the optimality guarantee of A*, MHA* can still ensure optimality by employing an admissible anchor heuristic, even when other heuristics are potentially inadmissible. In this scenario with EPF and Euclidean distance, MHA* will use both heuristics to calculate the most optimal path.

3.3 Bayesian Inference of NLP Sentiment Analysis

We propose to map the safety sentiment of the BIM families and the NLP prompts to a probabilistic value using LLM reasoning capabilities. The probabilistic value ranges from 0 to 1, and the higher number suggests a higher possibility of dynamism, value, and localization errors due to object geometry or location relative to the sensors. A previous work by the authors has shown the validity of using GPT risk sentiment analysis concerning BIM families [20]

Initially, an EPF is generated using the BIM families present in the model where the robot is planned to be used, using GPT-produced coefficients to scale EPF values. This value will serve as the prior for each of the objects’ EPF scaling factors. The user’s prompt will serve as evidence in the Bayesian inference framework. This gives us a robust mathematical framework to be able to update our scalars. Bayesian inference is given by Equation 5:

P(H|E)=P(E|H)P(H)P(E)P(H|E)=\frac{P(E|H)P(H)}{P(E)} (5)

where, EE is the evidence or observed state, and HH is the hypothesis or our prior.

In this case, we treat our prior hypothesis HH as the initial output of the GPT using BIM families. The evidence EE will be the output of the GPT given the prompt and the prior information given previous outputs. Since our distribution is discrete given that the commands are an integer number of prompts, we can expand P(E)P(E) in the following form.

P(H|E)=P(E|H)P(H)P(E|H)P(H)+P(E|¬H)P(¬H)P(H|E)=\frac{P(E|H)P(H)}{P(E|H)P(H)+P(E|\neg H)P(\neg H)} (6)

The term P(E|¬H)P(¬H)P(E|\neg H)P(\neg H) in the denominator is problem-dependent. For simplicity, one can interpret the denominator as a scaling constant that can be tuned to the problem constraints.

The formulation in Equation 6 allows us to use infinitely many prompts to update our path-finding landscape.

P(HE1,En)=P(H)i=1nP(EiH,E1,,Ei1)P(E1,E2,,En).P(H\mid E_{1}\ldots,E_{n})=\frac{P(H)\prod_{i=1}^{n}P(E_{i}\mid H,E_{1},\ldots,E_{i-1})}{P(E_{1},E_{2},\ldots,E_{n})}. (7)

Equation 7 can generally be intractable. This is due to the conditional probability term in the numerator that is notoriously difficult to evaluate. In practice, to simplify this computation, we assume that the likelihoods are conditionally independent. This will simplify the problem even further:

P(HE1,En)=P(H)i=1nP(EiH)P(E1,E2,,En).P(H\mid E_{1}\ldots,E_{n})=\frac{P(H)\prod_{i=1}^{n}P(E_{i}\mid H)}{P(E_{1},E_{2},\ldots,E_{n})}. (8)

Where:

P(E1,E2,,En)=HP(H)i=1nP(EiH).P(E_{1},E_{2},\ldots,E_{n})=\sum_{H^{\prime}}P(H^{\prime})\prod_{i=1}^{n}P(E_{i}\mid H^{\prime}). (9)

Yielding the final form of:

P(HE1,En)=P(H)i=1nP(EiH)HP(H)i=1nP(EiH).P(H\mid E_{1}\ldots,E_{n})=\frac{P(H)\prod_{i=1}^{n}P(E_{i}\mid H)}{\sum_{H^{\prime}}P(H^{\prime})\prod_{i=1}^{n}P(E_{i}\mid H^{\prime})}. (10)

This formulation will allow us to chain multiple language commands with varying confidence ratios with our planning framework. Once the ultimate coefficient P(HE1,En)P(H\mid E_{1}\ldots,E_{n}) is calculated, we can use it directly as the krepk_{rep} or DmaxD_{max} term. in Equation 1. Since the prompts change the environment’s potential value, each node’s potential heuristic can be also changed. Figure 2 shows an example of the effects on the potential field given different prompts. The potential value of each grid is altered by the framework’s analysis of prompt safety. The heatmap in Figure 2 denotes safe to dangerous areas using a blue-to-red color spectrum, and the values designated to the colors are the value of the EPF on each grid point, starting from 0 for the safest situation to 5 for the most dangerous. Figure 2(a) represents the initial evaluation of the scene that follows neutral scaling factors generated from the GPT and BIM families. However, in Figures 2(b) and 2(a) we see the effects of the prompt on the values of the potential field. When the framework receives a prompt with dangerous sentiment, the potential field increases in strength, which affects the path by choosing a longer but safer path. Conversely, when the framework is represented with a safer prompt, the potential field decreases in strength as seen in Figure 2(c). Furthermore, when the potential field is zero across the map, the pathfinding problem reduces to the naive A* algorithm. It is important to note that this probabilistic update is not limited to this specific problem formulation. Any function or heuristic can have its parameters altered given these updates, resulting in different policy realizations.

Refer to caption
(a) Without using any prompts
Refer to caption
(b) With a prompt implying a dangerous environment
Refer to caption
(c) With a prompt implying a safe environment
Figure 2: Comparison of results under different conditions: (a) without using any prompts, (b) using a prompt that implies a dangerous environment, and (c) using a prompt that implies a safe environment.

4 Methodology

The developed framework loads a BIM in a robot simulation platform such as Unity or Gazebo in the form of an FBX file. Once the BIM has loaded, the names of the families are parsed and prepared in a JSON file for further processing. The JSON file is then prompted into any LLM platform with an associated prompt. Each object is then associated with a specific scalar given the LLM’s reasoning regarding the regression value from the sentiment of the family’s name with the context of the rest of the BIM. These coefficients are available to be used in the context of scaling the EPF. Given a prompt or a chain of prompts, the framework can start updating the coefficient. The prompt is appended with the current state P(H)P(H) to yield current conditioned evidence P(E|H)P(E|H). The new information is parsed from the response of the GPT. We then calculate the new coefficients using Equation 6. Once the coefficients are retrieved, the EPF is recalculated and ready to be used with MHA* for robotic planning. Figure 3 illustrates this process.

Refer to caption
Figure 3: NLP processing framework

5 Experiments & Results

We consider the shortest path possible and the average Euclidean distance from any given environment as our baseline to evaluate the efficacy of the method. This value is returned by the classical A* algorithm which guarantees the shortest path. The path can be seen in Figure 4.

Refer to caption
Figure 4: Baseline Path Calculated by A*

We then experiment with the same starting point using two different prompts. While our theoretical framework allows for the consolidation of infinitely many prompts, we examine the single prompt setting and leave the analysis of the multi-prompt scenario for future works. We choose two extremes to demonstrate the prompt’s effect on the path. One extreme would be ?The environment is incredibly safe? and ?The environment is incredibly dangerous? for the other extreme. Figures 5 and 6 show each condition respectively. There is a clear qualitative and quantitative difference in the length and behavior of the path given simply a different prompt regarding the context of the prompt. Table 1 shows the statistics and improvements in speed and minimum distance to obstacles (MDO).

The baseline algorithm implementation will always result in the shortest path possible due to the guarantees regarding optimizing the single heuristic. In this case, the heuristic is Euclidean distance which will result in the shortest path. This approach will not take obstacle distance into account. The multi-heuristic approach, however, enables the ability to do so, as reflected in Table 1. We can see that the safe prompt improves the obstacle avoidance metric at a small cost in path length. This is due to a small potential field being generated as a consequence of a safe prompt. The effects of the prompt on the coefficients that affect the EPF are shown in Tables 2 and 3. Furthermore, we can see a large improvement in the object avoidance metric when the framework is presented with a prompt implying a dangerous environment, even though this comes at the cost of taking a longer path. This framework enables users to be able to both optimize between distance and object avoidance while being able to inform the robot of the world state.

Refer to caption
Figure 5: Calculated path given safe prompt
Refer to caption
Figure 6: Calculated path given dangerous prompt
Table 1: Bayesian path metrics for all scenarios
Strategy Path Length(m) MDO(m)
Baseline 3.286 0.40
Safe 3.288 0.43
Dangerous 3.922 0.60
Table 2: Bayesian update safe scenario
BIM family Prior Coef. New Coef. Updated Coef.
Wall 0.2 0.02 0.03
Grinder 0.8 0.08 0.78
Chainsaw 0.95 0.09 0.90
Robot 0.9 0.09 0.91
Chair 0.6 0.06 0.21
Table 3: Bayesian Update dangerous Scenario
BIM family Prior Coef. New Coef. Updated Coef.
Wall 0.3 0.8 0.01
Grinder 0.7 0.9 0.59
Chainsaw 0.9 1 0.59
Robot 0.8 1 0.51
Chair 0.1 0.6 0.43

6 Discussion & Limitations

The described methodology presents a computationally efficient and transparent opportunity for consolidating and analyzing sentiment from NLP commands. However, its integration with different planning strategies remains to be explored. Currently, the integration is with a heuristic-based approach, meaning that the algorithm finds an optimal policy given the heuristic of choice. This may or may not be effective given different conditions and prompts. For example, if the space only has one possible route from the starting point to the goal point, no matter how strong the prompt and by extension the EPF might be, the only viable solution would be that route. The optimal policy can remain unchanged if the prompt provides slight differences in heuristic parameters. Other approaches must be considered to integrate Bayesian consolidation into a more granular and flexible planning strategy.

Furthermore, as with the state-of-the-art LLMs, there is always a possibility of hallucinations, which yields inaccuracies such as incorrect likelihood coefficients. To quantify the reliability of GPT 3.5-turbo for this task, we experimented with two prompts and a baseline evaluation with no prompts over 100 iterations to measure the statistics of how consistent it is when prompted to perceive semantic information. Table 4 represents the statistics regarding GPT 3.5-turbo output stability. In this table, the values represent the danger levels associated with each BIM family when no prompts are given, with a prompt implying a dangerous environment, and with a prompt implying a safe environment. It can be seen that the GPT performs reasonably well in assigning new danger values to most BIM objects with respect to the sentiment of the prompt. This comparison over several iterations indicates that the likelihood of false or inaccurate coefficients is relatively low in simple prompts and NLP commands such as those needed for the proposed framework to generate the expected results. While GPT 3.5-turbo is a fast and cost-effective choice, it lacks the advanced reasoning capabilities of more modern counterparts such as GPT 4 [21]. To ensure better parameter scaling and lower variance, using more advanced GPT models such as GPT 4o or GPT o1 is recommended.

Table 4: Danger Levels from 100 iterations to assess the reliability of GPT 3.5-turbo in understanding prompt semantics
BIM family Original Value Safe Prompt Dangerous Prompt
Wall 0.27 ±\pm 0.18 0.18 ±\pm 0.17 0.43 ±\pm 0.31
Grinder 0.71 ±\pm 0.15 0.60 ±\pm 0.33 0.74 ±\pm 0.19
Robot 0.60 ±\pm 0.11 0.50 ±\pm 0.28 0.67 ±\pm 0.18
Chainsaw 0.74 ±\pm 0.17 0.61 ±\pm 0.34 0.75 ±\pm 0.19
Chair 0.32 ±\pm 0.14 0.21 ±\pm 0.15 0.44 ±\pm 0.27

7 Conclusion and Future Works

This paper proposes a new approach to consolidate and use natural language prompts for robotic navigation in construction contexts. The commands are entered into a Bayesian framework that will affect the robot’s heuristic parameters. Results show a robust alignment of mission planning with different desired metric values.

For future works, we plan to expand on this topic by introducing multiple prompts from different prompters. In this paper, we have theoretically proven the feasibility of this framework to account for multiple prompters. The goal is for the robot to understand the environment and the policy to implement in case of conflicting commands with appropriate confidence levels for each separate command, rather than to determine the task to execute in normal situations.

8 Acknowledgments

The presented work has been supported by the U.S. National Science Foundation (NSF) CAREER Award through grant No. CMMI 2047138, and grant No. DUE 1930546. The authors gratefully acknowledge the support from the NSF. Any opinions, findings, conclusions, and recommendations expressed in this paper are those of the authors and do not necessarily represent those of the NSF.

References