This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

RPF-Search: Field-based Search for Robot Person Following
in Unknown Dynamic Environments

Hanjing Ye, Kuanqi Cai, Yu Zhan, Bingyi Xia, Arash Ajoudani, and Hong Zhang∗‡ Equal contributions, Equal supervision, Correponding author (hzhang@sustech.edu.cn).Hanjing Ye, Yu Zhan, Bingyi Xia, and Hong Zhang are with SUSTech, China; Kuanqi Cai and Arash Ajoudani are with IIT, Italy.
Abstract

Autonomous robot person-following (RPF) systems are crucial for personal assistance and security but suffer from target loss due to occlusions in dynamic, unknown environments. Current methods rely on pre-built maps and assume static environments, limiting their effectiveness in real-world settings. There is a critical gap in re-finding targets under topographic (e.g., walls, corners) and dynamic (e.g., moving pedestrians) occlusions. In this paper, we propose a novel heuristic-guided search framework that dynamically builds environmental maps while following the target and resolves various occlusions by prioritizing high-probability areas for locating the target. For topographic occlusions, a belief-guided search field is constructed and used to evaluate the likelihood of the target’s presence, while for dynamic occlusions, a fluid-field approach allows the robot to adaptively follow or overtake moving occluders. Past motion cues and environmental observations refine the search decision over time. Our results demonstrate that the proposed method outperforms existing approaches in terms of search efficiency and success rates, both in simulations and real-world tests. Our target search method enhances the adaptability and reliability of RPF systems in unknown and dynamic environments to support their use in real-world applications. Our code, video, experimental results and appendix are available at https://medlartea.github.io/rpf-search/.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 1: RPF-Search addresses the challenges of person search in unknown, dynamic environments. The proposed approach enables the system to search for the target person in promising directions under topographic occlusion, as demonstrated in (a), despite a rough trajectory prediction. Additionally, the proposed method adapts to dynamic occlusions, either overtaking the occluders (b) or following them (c) based on their motion patterns and observed environments.

I INTRODUCTION

Robot person following[1, 2] (RPF) plays a vital role in human-robot interaction, with applications in personal assistance, security, and service robots. RPF is essential for tasks like elderly care and guided tours, especially in complex and dynamic environments. However, target occlusions due to static objects or moving pedestrians can cause tracking failures. This points to a need for the robot to perform a search task to re-find the target person – a critical component of robust RPF systems, as illustrated in Fig. 1.

Traditional search methods assume known environments and treat the task as a coverage problem[3], maximizing visibility within limited timeframes. These approaches use prior information about both the environment and the target person, such as metric graphs[4], topo-metric graphs[3, 5], or semantic graphs[6] to locate lost targets. Subsequent searches rely on criteria like frequently visited nodes in topometric graphs or objects associated with the person in semantic graphs. While effective in static or known environments, these methods often struggle in complex and unknown settings. Exploration-based techniques[7, 8, 9] select goals from frontiers of the searched region via utility metrics or deep reinforcement learning[9], but neglect motion cues of the lost person, resulting in inefficient searches.

Additionally, some methods leverage trajectory prediction models, such as linear regression[10], Bayesian regression[11], and SVM-based regression[12], to estimate the target’s future position. If unsuccessful, the search often defaults to frontier navigation, repeating the process[12]. Such a greedy strategy ignores environmental dynamics, leading to extended searches. Furthermore, these methods neglect the dynamic behavior of other people in a real environment, resulting in dynamic occlusion of the target person and the risk of robot collision.

To address these limitations, we propose a field-based heuristic-guided search framework that incrementally maps the environment while following the target. The framework prioritizes areas with a high probability of target existence while maximizing environmental exploration and adapts to different types of occlusions:

  • Topographic Occlusion: For static obstacles like walls or corners, a belief-guided search field estimates target probability, optimizing information gain, trajectory prediction, path efficiency, and collision avoidance.

  • Dynamic Occlusion: When moving pedestrians obstruct the target, the robot switches to a fluid-following field to follow the occluders or adopts an observation-based potential field to overtake them by analyzing their motion patterns.

Unlike traditional map-dependent methods, our approach dynamically constructs maps during RPF by focusing on high-probability areas instead of uniform exploration typically employed in active SLAM [7, 8]. This probability representation is derived from our proposed incrementally constructed search fields, which leverage target motion cues and recently observed environmental information. As illustrated in Fig. 2, this adaptive strategy ensures efficient and responsive searching in evolving environments, thereby enhancing the reliability of an RPF system.

II Related Work

II-A Person Search Under Topographic Occlusion

Person search in RPF aims to relocate a lost target. Existing methods often estimate the person’s position using a pre-built map. For example, Goldhoorn et al.[13] employ a Partially Observable Markov Decision Process (POMDP) algorithm to estimate the target location on an occupancy map. To mitigate POMDP’s high computational cost, they later introduced a particle-filter-based approach[4], propagating particles from the target’s last known location and assigning higher weights to those in visible free spaces. The particle with the highest weight is selected as the next search goal.

Beyond occupancy maps, topo-metric graphs[3, 5] and semantic graphs[6] have been used. For instance, Isler et al.[3] employ polygon triangulation to transform a metric map into a tree and use game theory to identify the optimal search node. Some methods incorporate prior knowledge to expedite the search. Bruckschen et al.[5] utilize frequently visited locations in a topo-metric graph, modeling belief states with a hidden Markov model and updating them based on new observations. Similarly, semantic graphs have been used to predict navigation goals based on human-object interactions[6]. These approaches rely on prior information, often framing person search as a coverage problem. However, such prior knowledge is difficult to obtain in real-world RPF scenarios. A practical RPF system must enable robots to follow targets seamlessly in unfamiliar environments, requiring a way to search without pre-built maps. Some methods bypass the need for prior information by leveraging historical trajectories to predict likely destinations. To handle sudden occlusions, techniques such as Kalman filters[14], variational Bayesian regression[11], and least squares linear regression[10] forecast future trajectories. Although map-independent, these methods rely on single predictions, limiting their adaptability to handle unknown environments.

Person search in unknown environments inherently combines exploration with target search. Active SLAM[8] is a closely related research topic, focusing on efficient exploration by selecting optimal frontiers using utility-based metrics[7] or deep reinforcement learning policies[9]. Placed et al.[8] introduce a graph-SLAM utility function to balance localization accuracy and exploration efficiency. Unlike these methods, which prioritize mapping, our approach focuses on effective person search. By detecting frontiers, we construct a belief-guided search field representing the likelihood of the target location. This field is incrementally updated using the target’s past motion cues and recent environmental observations, ensuring efficient and adaptive search.

II-B Person Search Under Dynamic Occlusion

Following a target person in a dynamic environment is challenging, as objects such as people can obscure a robot’s view, causing tracking failures. Effective strategies for searching and re-identifying the target after occlusion are essential. A key difficulty lies in navigating through a dynamic crowd, where one approach is to follow the crowd until the target is re-identified. Crowd-aware navigation methods address this challenge by modeling pedestrian flows. Henry et al.[15] used inverse reinforcement learning (IRL) with Gaussian processes for human-like navigation. Cai et al.[16] proposed a sampling-based flow map algorithm but assumed complete knowledge of pedestrian positions. Dugas et al.[17] introduced a pseudo-fluid model with an observation-based flow-aware planner, while Jacques et al.[18] developed the “Flow Grid” model, enabling the robot to adapt to human motion and optimize paths using a modified A* algorithm. These methods excel in dense crowds or with fast-moving occluders but are less effective when occluders are sparse or slow. In such scenarios, overtaking occluders to re-identify the target becomes a practical and efficient strategy.

Overtaking maneuvers have been widely studied in autonomous driving [19], with approaches generally divided into learning-based and model-based methods. Learning-based methods, such as Chen et al.[20] applying deep reinforcement learning, model human-like overtaking but require large datasets and extensive simulations and face challenges in generalization and explainability. In contrast, model-based methods offer greater interpretability and efficiency. Schwarting et al.[21] developed a nonlinear model predictive control (NMPC) approach accounting for road boundaries, vehicle dynamics, and uncertainties in other traffic participants. Ma et al.[22] used potential field-based methods to enhance safety and comfort during overtaking by incorporating velocity, acceleration, and road boundaries. While effective in vehicular contexts, these methods are not designed for search tasks.

This paper presents a potential field-based strategy to handle occlusions in dynamic environments. By adaptively switching between following and overtaking occluders based on their motion cues, the proposed approach ensures efficient target re-identification in diverse social navigation scenarios.

Refer to caption
Figure 2: The framework of our RPF-search consists of three conditions: Occ, DO, and TO, representing “is occlusion”, “is dynamic occlusion” and “topographic occlusion”, respectively. The search process is divided into two categories: topographic occlusion from environmental corners and dynamic occlusion caused by moving pedestrians. For topographic occlusion, we use a belief-guided search field to optimize target search by selecting promising searching points based on information gain, target probability, motion efficiency, and collision risk. For dynamic occlusion, motion cues from pedestrians guide whether to follow via a fluid-field approach or overtake using an observation-based potential field.
Refer to caption
Figure 3: Simplified diagram of the belief-guided search field update process. When the target is lost, the existence probability 𝒱_e\mathcal{V}_{\_}e, estimated through trajectory prediction, guides the search. The inference factor η\eta penalizes unlikely candidates based on path simulations. To handle limited visibility, probability inheritance 𝒱_p\mathcal{V}_{\_}p balances exploration inside and outside the room. This belief-guided field enables the robot to efficiently search the most promising locations to locate the target in an unknown environment.

III Method

The framework of the proposed method is illustrated in Fig. 2. Our approach tracks people and identifies the target person based on our previous work [23]. Additionally, we utilize visual-inertial odometry for robot localization and employ Octomap [24] to map unknown environments. Octomap is a probabilistic-based mapping method that remains robust to dynamic objects.

Leveraging the above perception information, this paper primarily addresses the challenge of person search in unknown and dynamic environments, particularly when the target person is lost due to occlusions. To differentiate between types of occlusions—topographic and dynamic—we use an intersection-over-union (IoU) analysis. If the occluder is identified as a pedestrian via the IoU assessment (i.e., the target person’s bounding box previously interacted with another pedestrian), the system activates an Observation-based Search Field (Sec. III-B) to resume target tracking under dynamic occlusions. In contrast, the presence of static obstacles prompts the deployment of a Belief-guided Search Field (Sec. III-A), enabling smooth adaptation to static occlusions. Detailed method flow can be found in the supplementary materials.

III-A Belief-guided search field

Our approach identifies the optimal search goal using the partially built map, past target trajectory data, and the robot’s current position. It involves three steps: candidate generation, weight calculation, and belief-guided field establishment with a simplified diagram shown in Fig. 3.

III-A1 Candidate Generation

When the target disappears, the robot constructs a belief-guided field to represent the probability of the target’s location. Search candidates are generated using current and historical observation to focus on high-belief regions. In a partially mapped environment, a key issue in our solution involves identifying frontiers, points on the boundary between mapped and unknown regions. These frontiers serve as both observation points and safe navigation locations due to their proximity to mapped free spaces. The belief-guided field is initialized with high values at these frontier points.

Frontiers are detected using Canny Edge Detection on the map[8], with contour centers identified as candidates {_i}\{\mathcal{F}_{\_}i\}. These candidates form the high-belief regions guiding the robot’s search. With these candidates, we define the belief-guided field and obtain the searching point 𝐱\mathbf{x}^{*} as follows:

𝐱=argmax_𝐱{W_iexp(𝐱_i22σ2)},{\mathbf{x}^{*}}=\arg\max_{\_}{\mathbf{x}}\left\{W_{\_}i\exp\left(-\frac{\|\mathbf{x}-\mathcal{F}_{\_}i\|^{2}}{2\sigma^{2}}\right)\right\}, (1)

where 𝐱=[x,y]\mathbf{x}=[x,y] represents a position in the environment, σ\sigma is the Gaussian variance, and W_iW_{\_}i is the weight of the ii-th candidate, estimated from the target’s motion cues and environmental information. The maximum operator ensures that overlapping Gaussians prioritize the highest probability, capturing the most likely location.

III-A2 Candidate Weight Calculation

The utility of each candidate is quantified by expected uncertainty in three target variables: information gain 𝒱_g\mathcal{V}_{\_}g, collision risk 𝒱_c\mathcal{V}_{\_}c, and target existence 𝒱_e\mathcal{V}_{\_}e. The candidate weight is computed as:

W_i=η_i(𝒱_g(_i)+𝒱_c(_i)+𝒱_e(_i)+𝒱_p(_i)),W_{\_}i=\eta_{\_}{i}\left(\mathcal{V}_{\_}g(\mathcal{F}_{\_}i)+\mathcal{V}_{\_}c(\mathcal{F}_{\_}i)+\mathcal{V}_{\_}e(\mathcal{F}_{\_}i)+\mathcal{V}_{\_}p(\mathcal{F}_{\_}i)\right), (2)

where 𝒱_p\mathcal{V}_{\_}p represents probabilistic inheritance, capturing probability transfer from farther candidates, and 𝒱_g\mathcal{V}_{\_}g, 𝒱_c\mathcal{V}_{\_}c, and 𝒱_e\mathcal{V}_{\_}e are normalized terms explained below.

Inference Factor η_i\eta_{\_}{i}: To reduce computational load and quickly locate the target, the inference factor η_i\eta_{\_}i filters candidates based on the maximum distance a pedestrian could travel since last observed, calculated using the maximum velocity of pedestrian. If the distance between a candidate and the last known position of the target exceeds this estimate, it implies the candidate would have been observed if the target were moving in that direction. Thus, further weight calculation is unnecessary for such candidates. The inference factor η_i\eta_{\_}i is defined as:

η_i:=𝕀(Δtv_max>D_ijk(P_lost,_i)),\eta_{\_}{i}:=\mathbb{I}\left(\Delta t\cdot v_{\_}{\max}>D_{\_}{\text{ijk}}(P_{\_}{\text{lost}},\mathcal{F}_{\_}i)\right), (3)

where 𝕀\mathbb{I} is the indicator function, returning one if the condition holds and zero otherwise. Δt\Delta t is the time elapsed since the target was last seen, and D_ijk(P_lost,_i)D_{\_}{\text{ijk}}(P_{\_}{\text{lost}},\mathcal{F}_{\_}i) is the Dijkstra-computed distance between the lost position P_lostP_{\_}{\text{lost}} and the candidate _i\mathcal{F}_{\_}i in the map.

Information Gain 𝒱_g()\mathcal{V}_{\_}g(\cdot): Information gain prioritizes maximizing observation of unknown areas using entropy to quantify exploration uncertainty [7]. The expected entropy of a candidate is:

𝒱_g(_i)=_cAdj(_i)p(c)log_2(p(c)),\mathcal{V}_{\_}g(\mathcal{F}_{\_}i)=-\sum_{\_}{{\rm{c}}\in{\rm Adj}(\mathcal{F}_{\_}i)}p(c)\log_{\_}2(p(c)), (4)

where p(c)p(c) is the occupancy probability of cell cc, and Adj(){\rm Adj}(\cdot) represents the adjacent cells of the candidate in the map.

Collision Risk 𝒱_c()\mathcal{V}_{\_}c(\cdot): This factor prevents candidates from being too close to obstacles or pedestrians (non-target), reducing collision risk and avoiding invading human comfort space:

𝒱_c(_i)=max(Φ(_i))R(_i)+δ,\mathcal{V}_{\_}c(\mathcal{F}_{\_}i)=\frac{\max({\Phi}(\mathcal{F}_{\_}i))}{R(\mathcal{F}_{\_}i)+\delta}, (5)

where Φ(){\Phi}(\cdot) represents the comfortable region around humans [25], R()R(\cdot) is the collision probability with obstacles [26], and δ\delta ensures numerical stability.

Target Person Existence 𝒱_e(){\mathcal{V}_{\_}e}(\cdot): To efficiently locate the target, 𝒱_e{\mathcal{V}_{\_}e} evaluates candidates based on predicted potential locations when the target abruptly disappears. Unlike traditional trajectory prediction relying on global maps, 𝒱_e{\mathcal{V}_{\_}e} uses human motion cues to estimate potential directions within a limited observation map. It consists of two steps: 1) initial trajectory prediction and 2) probability propagation.

III-A2a Initial Trajectory Prediction

A regression model is constructed based on the target’s historical motion cues, including velocity and positions, to forecast potential trajectories. We use support vector machine regression (SVR), which effectively captures nonlinear relationships and provides accurate fits in constrained environments [12]. The prediction is framed as finding a function, [x_t,y_t]T=𝐰Tψ(t)+𝐛[x_{\_}t,y_{\_}t]^{T}=\mathbf{w}^{T}\psi(t)+\mathbf{b}. 𝐰\mathbf{w} is the weight vector, 𝐛\mathbf{b} is a constant bias vector, and ψ\psi represents a non-linear mapping. Using historical timestamps and positions of the target as the training dataset, the function is learned with SVR theory, enabling estimation of the target’s position at any future timestamp.

III-A2b Probability Propagation

Unlike trajectory predictions using accurate global maps, our algorithm predicts trajectories based on historical human cues and a limited map generated by the robot’s observations. The predicted trajectory is transformed into a probabilistic representation using a Gaussian-based method combined with motion cues. The probability of a point in the trajectory is defined as:

𝒩_μ(t),Σ(t)(𝐱)=e12(𝐱μ(t))TΣ(t)1(𝐱μ(t)),\mathcal{N}_{\_}{\mu(t),\Sigma(t)}(\mathbf{x})=e^{-\frac{1}{2}(\mathbf{x}-\mu(t))^{T}\Sigma(t)^{-1}(\mathbf{x}-\mu(t))}, (6)

where μ(t)=[μ_x,t,μ_y,t]\mu(t)=[\mu_{\_}{x,t},\mu_{\_}{y,t}] represents a predicted point, and Σ(t)\Sigma(t), the covariance matrix, reflects the probable area of the target based on velocity vv and social distance d_sd_{\_}s [25]:

Σ(t)=Rot(φ_t)diag(vd_s,d_s)2Rot(φ_t)T,\Sigma(t)=\mathrm{Rot}(\varphi_{\_}t)\cdot\mathrm{diag}(vd_{\_}s,d_{\_}s)^{2}\cdot\mathrm{Rot}(\varphi_{\_}t)^{T}, (7)

where φ_t\varphi_{\_}t, estimated from adjacent points, defines the orientation, resulting in an elliptical probability area in the predicted direction. For NN predicted items, the probability of the target’s appearance is:

𝒱_e(𝐱)=_t=t_0t_Nω_t𝒩_μ(t),Σ(t)(𝐱),\mathcal{V}_{\_}e(\mathbf{x})=\sum_{\_}{t=t_{\_}0}^{t_{\_}N}\omega_{\_}t\mathcal{N}_{\_}{\mu(t),\Sigma(t)}(\mathbf{x}), (8)

where ω_t\omega_{\_}t is the weight of each component. To account for increasing uncertainty over time, weights decrease monotonically:

ω_t=eλt_t=1Neλt,with_t=t_0t_Nω_t=1.\omega_{\_}t=\frac{e^{-\lambda t}}{\sum_{\_}{t=1}^{N}e^{-\lambda t}},\quad\text{with}\quad\sum_{\_}{t=t_{\_}0}^{t_{\_}N}\omega_{\_}t=1. (9)

Probabilistic Inheritance 𝒱_p()\mathcal{V}_{\_}p(\cdot): In unstructured environments, obstacles can significantly restrict visibility, causing the robot to miss detections even at optimal candidate positions. For instance, in Room 1 (Fig. 6(a)), the target (blue dot) is hidden behind Obstacle 1 and Obstacle 2. Without a comprehensive exploration of the room, the robot is likely to miss detecting the target person at this location. To address this, we propose a probabilistic inheritance strategy to balance exhaustive exploration of the room and outside the room. The probability of a candidate _i\mathcal{F}_{\_}i inherits from its parent candidate, Parent(_i)Parent(\mathcal{F}_{\_}i), the closest candidate from the previous generation. The inherited probability is defined as:

𝒱_p(_i)=P_parented_p/ε,\mathcal{V}_{\_}p(\mathcal{F}_{\_}i)=P_{\_}{\rm parent}\cdot e^{-d_{\_}p/\varepsilon}, (10)

where d_pd_{\_}p is the distance between _i\mathcal{F}_{\_}i and Parent(_i)Parent(\mathcal{F}_{\_}i), and the inheritance probability decays with increasing distance. The normalization factor ε\varepsilon is adaptive to the room size, ensuring probability decay matches the environment. If candidates inside the room have lower probabilities than previous candidates, the robot transitions from searching inside the room to exploring outside.

III-B Observation-based search field

Another significant challenge in accurately tracking a target in dynamic environments is dynamic occlusion. Moving pedestrians can obstruct the robot’s view, causing it to lose track of the target. For instance, the target may enter a crowd or random people may walk between the robot and the target. Unlike the belief-guided field used for topographic occlusions, we propose an observation-based search field leveraging the motion cues of occluders to handle dynamic occlusions.

Dynamic occlusions are categorized into two types. Temporary occlusion occurs when a pedestrian briefly crosses between the target and the robot. Here, the occlusion duration is short, allowing the robot to quickly resume tracking without altering its strategy and maintaining the previous track. Long-term occlusion happens when occluders persist in the robot’s view, easily resulting in a track loss. In this case, we categorize occlusions into two types based on the motion cues of the occluder(s) and introduce two corresponding strategies: 1) Observation-Based Potential Field Overtake Strategy, and 2) Fluid Field-Based Following Strategy. Both strategies use vector fields and are triggered when the target is obscured by occluders.

Refer to caption
Figure 4: Overtaking Visualization: During dynamic occlusion, the robot constructs an overtaking field using the occupancy map and occluder motion cues to optimize target observation, path cost, and collision avoidance. Lighter areas indicate lower costs (max cost = 5.0). The robot overtakes the occluder (red) by selecting an optimal goal (orange) from candidate points (blue). Initially, the field is vertically symmetric (first row). As the robot moves to the lower region, reduced path costs in this area (lighter colors) lead the robot to consistently choose the same overtaking point, resulting in a smooth overtaking trajectory (orange path in the last row).
Refer to caption
Figure 5: Fluid-Following Visualization: In multi-person occlusion scenarios where overtaking isn’t feasible, the robot constructs a fluid-following field using motion cues from visible occluders (red circles). The first row displays the velocity field of the pseudo-fluid, the second row shows the density field to avoid high-density areas, and the final row combines these into the fluid-following field. Using this field, the robot selects an optimal goal (orange) from candidate points (blue), ensuring safe navigation along the pseudo-fluid path while avoiding high-density zones.

III-B1 Observation-based potential field overtake strategy

When the target is lost due to dynamic occlusion, a natural human response is to overtake the occluders to regain visibility and resume following. To do this, we propose an observation-based potential field overtaking strategy leveraging occluder motion cues. The strategy predicts the occluder trajectory, U_occluderU_{\_}{\rm occluder}, and constructs an overtaking field using repulsive and attractive forces inspired by the artificial potential field method. The repulsive force [27] prevents collisions with pedestrians (including occluders) and obstacles:

𝐅_rep(𝐱)={η_f(1ϖ1d_0)ϖϖ3,if ϖd_0,0,if ϖ>d_0,\mathbf{F}_{\_}{\text{rep}}(\mathbf{x})=\begin{cases}\eta_{\_}f\left(\frac{1}{\|\varpi\|}-\frac{1}{d_{\_}0}\right)\frac{\varpi}{\|\varpi\|^{3}},&\text{if }\|\varpi\|\leq d_{\_}0,\\ 0,&\text{if }\|\varpi\|>d_{\_}0,\end{cases} (11)
ϖ=𝐱argmin𝐮U_occluder𝐱𝐮,\varpi=\mathbf{x}-\underset{\mathbf{u}\in U_{\_}{\rm occluder}}{\arg\min}\|\mathbf{x}-\mathbf{u}\|, (12)

where η_f\eta_{\_}f is the repulsive gain coefficient, and d_0d_{\_}0 is the influence distance beyond which the repulsive force ceases. The attractive force guides the robot to optimal positions, using a two-step process: Filter and Plan:

𝐅_att(𝐱)={η_a(𝐱_robot𝐱)/(G(𝐱|𝐱_occ,𝐱¯_tar)+1),Filter,η_a(𝐱𝐱)/(G(𝐱|𝐱_occ,𝐱¯_tar)+1),Plan,\mathbf{F}_{\_}{\text{att}}(\mathbf{x})=\begin{cases}\eta_{\_}a(\mathbf{x}_{\_}{\text{robot}}-\mathbf{x})/(G(\mathbf{x}|\mathbf{x}_{\_}{\rm occ},\bar{\mathbf{x}}_{\_}{\rm tar})+1),&\text{Filter},\\ \eta_{\_}a(\mathbf{x}-\mathbf{x}^{*})/(G(\mathbf{x}|\mathbf{x}_{\_}{\rm occ},\bar{\mathbf{x}}_{\_}{\rm tar})+1),&\text{Plan},\end{cases} (13)

where η_a\eta_{\_}a is the attractive gain coefficient, 𝐱_robot\mathbf{x}_{\_}{\text{robot}} is the robot’s current position, and G(𝐱|𝐱_occ,𝐱¯_tar)G(\mathbf{x}|\mathbf{x}_{\_}{\rm occ},\bar{\mathbf{x}}_{\_}{\rm tar}) represents the observation gain of a candidate 𝐱\mathbf{x} given the occluder position 𝐱_occ\mathbf{x}_{\_}{\rm occ} and the predicted target position 𝐱¯_tar\bar{\mathbf{x}}_{\_}{\rm tar}. Here, 𝐱¯_tar\bar{\mathbf{x}}_{\_}{\rm tar} is a predicted position in front of the occluder. Candidates with higher gains minimize occluder overlap in the image space from the robot’s camera perspective, facilitating better observation of 𝐱¯_tar\bar{\mathbf{x}}_{\_}{\rm tar}.

We follow the assumption from prior works on person tracking [28, 29, 23], where a person is projected into the image as a bounding box based on the camera’s pose and projection function, Proj(,𝐱)\text{Proj}(\cdot,\mathbf{x}). The observation gain is defined as G(𝐱|𝐱_occ,𝐱¯_tar)=1IoU(Proj(𝐱¯_tar,𝐱),Proj(𝐱_occ,𝐱)),G(\mathbf{x}|\mathbf{x}_{\_}{\text{occ}},\bar{\mathbf{x}}_{\_}{\rm tar})=1-\text{IoU}(\text{Proj}(\bar{\mathbf{x}}_{\_}{\rm tar},\mathbf{x}),\text{Proj}(\mathbf{x}_{\_}{\text{occ}},\mathbf{x})), where IoU(,)\text{IoU}(\cdot,\cdot) is the intersection-over-union of two bounding boxes. Additionally, the attractive field incorporates path cost via the direction 𝐱_robot𝐱\mathbf{x}_{\_}{\text{robot}}-\mathbf{x}. The combined field is shown in Fig. 4.

In the filter step, the observation-based potential field filters sampling points to identify the optimal candidate 𝐱\mathbf{x}^{*}, which maximizes target observation and minimizes trajectory cost. To accelerate selection and ensure smooth navigation, the search area is restricted to the robot’s front, with candidates {𝐱_i}\{\mathbf{x}_{\_}i\} sampled using the Korobov lattice [30], a method for generating low-discrepancy points. The optimal candidate 𝐱\mathbf{x}^{*} is determined as:

𝐱=argmin_{𝐱_i}𝐅_att(𝐱_i)+𝐅_rep(𝐱_i).\mathbf{x}^{*}=\mathop{\arg\min}_{\_}{\{\mathbf{x}_{\_}i\}}\|\mathbf{F}_{\_}{\text{att}}(\mathbf{x}_{\_}i)\|+\|\mathbf{F}_{\_}{\text{rep}}(\mathbf{x}_{\_}i)\|. (14)

In addition, we record its minimum cost as CC^{*}. If C>δ_cC^{*}>\delta_{\_}c and the occluders’ maximum velocity is below δ_v_min\delta_{\_}{v_{\_}{\text{min}}}, the strategy switches to the belief-guided search field (Sec. III-A), suitable for global-scale planning in low-velocity, high-risk environments. If C>δ_cC^{*}>\delta_{\_}c or the occluders’ minimum velocity exceeds δ_v_max\delta_{\_}{v_{\_}{\text{max}}}, the strategy switches to the fluid-field-based following strategy (Sec. III-B2). Otherwise, 𝐱\mathbf{x}^{*} becomes the searching point. In the plan step, the attractive and repulsive fields are overlaid, and the gradient of the combined field determines the robot’s movement direction to 𝐱\mathbf{x}^{*}.

An example of the overtaking process is shown in Fig. 4. A slowly moving distractor (red square) occludes the robot’s view, causing a target loss. In response, an overtaking field is generated, where lighter colors indicate lower costs. With sufficiently low overtaking cost, the robot selects the optimal goal (orange) from candidates (blue) and overtakes the occluder. Initially, the field is nearly symmetric (first row), with low-cost regions on both sides. As the robot moves to the lower region, the path-cost effect reduces the cost there (darker shade), leading to consistent selection of the same overtaking point and producing a smooth overtaking trajectory (orange path, last row).

III-B2 Fluid field-based following strategy

To ensure natural and safe navigation in dynamic occlusion scenarios, following the occluding person is a viable strategy when overtaking is infeasible, such as when occluders move too fast or overtaking costs are too high, risking unsafe behavior. The goal of this strategy is for the robot to follow occluders effectively. Because moving too slowly increases the distance to the occluder, risking target loss, while overtaking may require acceleration, posing collision risks in human-shared environments.

Inspired by the fluid-based navigation model [31, 16], we propose treating detected pedestrians, including occluders, as a pseudo-fluid. This enables the estimation of a velocity field around the robot’s perception, leading to safe and efficient target tracking. The fluid-based vector field is modeled using the following variables.

Velocity Field v(𝐱)=(v^_x(𝐱),v^_y(𝐱))\mathrm{v}(\mathbf{x})=(\hat{\mathrm{v}}_{\_}x(\mathbf{x}),\hat{\mathrm{v}}_{\_}y(\mathbf{x})): To estimate the velocity at 𝐱\mathbf{x}, we consider detected pedestrians Z_p={(𝐱_j,v_j)}Z_{\_}p=\{(\mathbf{x}_{\_}j,\mathrm{v}_{\_}j)\} and use:

v^_x(𝐱)=_jZ_pαv_x,j_jZ_pα,v^_y(𝐱)=_jZ_pαv_y,j_jZ_pα,\hat{\mathrm{v}}_{\_}x(\mathbf{x})=\frac{\sum_{\_}{j\in Z_{\_}p}\alpha\mathrm{v}_{\_}{x,j}}{\sum_{\_}{j\in Z_{\_}p}\alpha},\quad\hat{\mathrm{v}}_{\_}y(\mathbf{x})=\frac{\sum_{\_}{j\in Z_{\_}p}\alpha\mathrm{v}_{\_}{y,j}}{\sum_{\_}{j\in Z_{\_}p}\alpha}, (15)

where α=exp(γ𝐱𝐱_j2)\alpha=\exp(-\gamma\|\mathbf{x}-\mathbf{x}_{\_}j\|^{2}) and γ\gamma controls the ”viscosity” of the fluid flow [31].

Density Field ρ(𝐱)\rho(\mathbf{x}): To avoid the robot rushing into high-density areas, the density field is defined as:

ρ(𝐱)=12πσ21N_obs(𝐱)_jZ_pexp(𝐱𝐱_j22σ2),\rho(\mathbf{x})=\frac{1}{2\pi\sigma^{2}}\frac{1}{N_{\_}{\text{obs}}(\mathbf{x})}\sum_{\_}{j\in Z_{\_}p}\exp\left(-\frac{\|\mathbf{x}-\mathbf{x}_{\_}j\|^{2}}{2\sigma^{2}}\right), (16)

where N_obs(𝐱)N_{\_}{\text{obs}}(\mathbf{x}) is the number of times 𝐱\mathbf{x} has been observed, encouraging navigation to familiar areas, and σ\sigma is a constant to control density smoothness.

Objective Function 𝒪(𝐱)\mathcal{O}(\mathbf{x}): The searching point 𝐱\mathbf{x}^{*} is obtained by minimizing:

𝐱=argmin_{𝐱_i}γρ(𝐱)v_robotv(𝐱),\mathbf{x}^{*}=\mathop{\arg\min}_{\_}{\{\mathbf{x}_{\_}i\}}\gamma\rho(\mathbf{x})\|\mathrm{v}_{\_}{\text{robot}}-\mathrm{v}(\mathbf{x})\|, (17)

where v_robot\mathrm{v}_{\_}{\text{robot}} is the robot’s velocity. Candidate points {𝐱_i}\{\mathbf{x}_{\_}i\} are sampled in front of the robot, and the optimal candidate is selected as the searhcing point. The purpose of this objective function (Eq. 17) is to guide the robot in following occluders by minimizing the deviation from their velocity while accounting for their density. Fig. 5 illustrates the process. With the proposed overtaking and fluid-following fields, our method effectively searches for the target person under dynamic occlusions, ensuring safe navigation in complex environments.

IV Experiments

This section evaluates the effectiveness of the proposed person-search method, particularly with regard to its performance in re-finding a lost target person due to topographic or dynamic occlusions, by comparing it with existing baseline methods under challenging conditions. The parameters of the proposed method are shown in the supplementary materials.

IV-A Experimental Settings

We evaluated our method in both real-world and simulated environments to ensure its robustness. The simulated settings included three room layouts with different scales: Bookstore (small), Hospital (medium), and Factory (large), as shown in Figs. 6(d), 6(c), and 6(a). These environments featured various objects and both static and dynamic individuals to create frequent occlusions, which are challenging for RPF. Additionally, within the factory, we designed six scenarios to assess the robot’s person-search capabilities under topographic and dynamic occlusions (see Figs. 6(a) and 6(b)). These scenarios include difficult U-turns and dynamic occlusions inspired by typical interactions, simulating complex real-life environments. Specifically, we developed three dynamic occlusion experiments: Dyna1: simulates the single-person occlusion with a constant low speed (1.0 m/s). Dyna2: features a single-person occlusion with decreasing speed (from 1.5 m/s to 0.8 m/s). Dyna3: represents multiple-person occlusions.

In the simulated environments, we use a differential-drive robot with a 2D LiDAR for navigation and an RGB-D camera for perception. A target person is re-identified after being observed in over five consecutive frames. Locations are estimated using YOLOX object detection and depth data. Individuals interact according to the social force model [32], ensuring realistic movements. The target’s start and end positions are randomly generated within free space to create various scenarios. Real-world scenarios are illustrated in Fig. 1. Due to page limitations, real-world qualitative and quantitative experimental results, along with all implementation details, are provided in the supplementary materials.

Refer to caption
(a) Factory Env.
Refer to caption
(b) Dynamic occlusion
Refer to caption
(c) Hospital Env.
Refer to caption
(d) Bookstore Env.
Figure 6: Simulated environments comprise three distinct scales and room structures: (a) Factory (large scale), (c) Hospital (medium scale), and (d) Bookstore (small scale). Six scenarios based on Factory are designed to independently assess the robot’s person-search ability under topographic occlusion (Room1, Room2 and Room3) and dynamic occlusion (Dyna1, Dyna2 and Dyna3), as shown in (a) and (b), respectively. We conducted three dynamic occlusion experiments: Dyna1 involves a single occluder moving, Dyna2 features a single occluder with a gradually decreasing speed, and Dyna3 simulates scenarios with multiple occluders.

IV-B Compared Algorithms and Evaluation Metrics

To validate the effectiveness of our proposed method, we compared it against several existing person-search baselines. The first baseline, GoToLostLocation (GTLL) [33], directs the robot to the target’s last known location when lost and then rotates to search. The second baseline, GoToPredictedLocation (GTPL) [12], uses an SVR-based model to predict and navigate to the target’s location before searching. Given the scarcity of methods targeting person search in unknown environments, we enhanced these baselines by incorporating a frontier map and a greedy next-best-view (NBV) strategy, resulting in GTLL + Greedy-NBV and GTPL + Greedy-NBV, where the robot first navigates to the initial location and then to the nearest frontier. We also integrated the map-based HB-Particle method [4], which directs the robot to the highest-belief particle with particles propagating into unknown regions, and the information gain-based Active Graph-SLAM [8], which employs frontier-based exploration to reduce pose graph uncertainty.

To assess all methods, we used established person-search metrics: Success Rate (SR) [4, 5], Success Weighted by Inverse Path Length (SPL) [34], and Trajectory and Velocity Analysis (TVA). SR measures the percentage of successful searches within a 180-second time limit over 100 runs per scenario. SPL evaluates search efficiency by considering both success and path optimality. TVA examines the smoothness of robot motion by plotting trajectories, velocities, and distances to the target in various dynamic occlusion scenarios.

Refer to caption
Figure 7: Searching keyframes of our method in topographic occlusion scenarios in real-world experiments. The first row shows the third view. The second row shows the explored occupancy map, frontiers (red squares), robot and target positions, ego view, and robot trajectories (orange lines). Once the occlusion happened, trajectory predictions direct the robot toward the region between the left corridor and the room, but our belief-guided graph updating integrates new environmental data and the target’s motion history to penalize unlikely left-corridor and room candidates to zero belief (candidate 1, 2 and 3) and focus the search on more promising areas towards the right corridor (candidate 4 with 0.014 belief) and finally re-found the target.

IV-C Ablation Experiments

IV-C1 Verification of the inference factor and probabilistic inheritance

To assess the roles of the inference factor and probabilistic inheritance in our belief-guided search framework, we conduct an ablation study with results shown in Table I. Removing either component can cause the robot to navigate incorrectly and significantly reduce its performance. For example, in Room3, SR and SPL were 62% and 60.8%, and 71% and 68.5%, respectively. Similar declines were observed in Factory and Hospital. Visualizations of baselines and our method are shown in the supplementary materials.

Here, we present a visualization of our real-world experiments (see Fig. 7). Initially, the trajectory prediction guided the robot toward the room and left corridor. However, the inference factor incorporated new environmental data, eliminating the left corridor and room candidates (1, 2, and 3) by reducing their belief to zero. This redirected the robot toward the higher-belief candidates in the right corridor (candidate 4 with a 0.014 belief), ensuring a successful search. Without the inference factor, the robot would have followed the initial trajectory, resulting in a failed search.

Furthermore, probabilistic inheritance enables new candidates to inherit beliefs from their parent candidates with decay. This allows the robot to focus on more probable candidates as the search progresses. As shown in the third and fourth subfigures of Fig. 7, candidates in the right corridor continue to inherit belief, guiding the robot to successfully re-find the target. Without probabilistic inheritance, distant candidates would lose all belief, leading to incorrect navigation and search failure. Our approach leverages the local connectivity of frontiers, optimizing search direction towards likely locations. In conclusion, both the inference factor and probabilistic inheritance are essential for effective navigation and search in complex environments, as demonstrated by improved performance in our ablation study.

TABLE I: Ablation study for our belief-guided field in topographic occlusion scenarios. “W/o inference factor” represents our RPF-Search without the inference factor (Eq. 3) and “w/o probabilistic inheritance” excludes the probabilistic inheritance (Eq. 10). Evaluation metrics include success rate (SR, %) and success weighted by inverse path length (SPL, %). We run 100 times for each scenario.
Method Factory(%) Hospital(%) Room3(%)
SR SPL SR SPL SR SPL
w/o inference factor 68 62.0 75 67.0 62 60.8
w/o probabilistic inheritance 56 46.4 63 54.3 71 68.5
RPF-Search (ours) 100 94.7 98 94.8 100 96.2
TABLE II: Ablation study for our overtaking and fluid-following fields in dynamic occlusion scenarios. W/o obs.-based overtaking field refers to RPF-Search without the observation-field-based overtaking mechanism, while w/o fluid-following field excludes the fluid-following component.
Method Dyna1(%) Dyna2(%) Dyna3(%)
SR SPL SR SPL SR SPL
w/o obs.-based overtaking field 25 24.4 24 23.2 91 87.1
w/o fluid-following field 95 91.6 86 82.3 82 78.5
RPF-Search (ours) 97 92.8 100 95.6 92 88.0
Refer to caption
(a) W/o overtaking (F, failed)
Refer to caption
(b) W/o fluid-following (S, succeed)
Refer to caption
(c) Ours (S, succeeded)
Refer to caption
(d) Comparison of robot-target distance and velocity across (a), (b), and (c)
Figure 8: Visualization of the ablation study in Dyna1, where a slow-moving distractor occludes the robot. (a) In the “w/o overtaking” scenario, the robot follows the occluder using fluid-following, increasing its distance from the target (blue dashed line in (d)), eventually failing to re-identify the target due to limited visibility. (b) In the “w/o fluid-following” case, the robot overtakes the occluder and quickly resumes tracking, maintaining a consistently close distance to the target (green dashed line in (d)). (c) In the “ours” scenario, the robot detects the slow-moving occluder. It performs an overtaking maneuver, leading to a successful search with a stable and close target distance (purple dashed line in (d)). (d) provides a comparison of (a), (b), and (c) in terms of robot-target distance (dashed lines) and velocity changes (solid lines).
Refer to caption
(a) W/o overtaking (F, failed)
Refer to caption
(b) W/o fluid-following: top (S, succeed) and bottom (F, failed)
Refer to caption
(c) Ours (S, succeeded)
Refer to caption
(d) Comparison of robot-target distance and velocity
Figure 9: Ablation study visualization in Dyna2 with decreasing occluder speed. In scenario (a) without overtaking, the robot trails the target, fails to re-identify them, and the robot-target distance increases (blue dashed line in (d)). In scenario (b) without fluid-following, two cases are shown: the top example where the robot eventually re-identifies the target but exhibits unstable motion due to frequent speed changes while overtaking a fast occluder (green solid line in (d)), and the bottom example where aggressive navigation leads to collisions and an increased distance (orange dashed line in (d)), resulting in search failure. In scenario (c), our method adapts fluid-following when the occluder is fast and switches to overtaking as it slows, ensuring a smooth trajectory and stable motion (purple solid line in (d)). Panel (d) describes robot-target distance (dashed lines) and velocity changes (solid lines).
Refer to caption
(a) W/o overtaking (S, succeeded)
Refer to caption
(b) W/o fluid-following: top (S, succeed) and bottom (F, failed)
Refer to caption
(c) Ours (S, succeeded)
Refer to caption
(d) Comparison of robot-target distance and velocity
Figure 10: Ablation study visualization in Dyna3 with multiple occluders. In scenario (a) without overtaking, the robot maintains stable fluid-following behavior, successfully resuming person-following when the target reappears, resulting in a close robot-target distance (blue dashed line) and steady motion (blue solid line) in (d). In scenario (b) without fluid-following, the robot attempts to overtake multiple occluders, leading to unstable motion and frequent shifts in the field of view, sometimes failing to relocate the target when the view diverges (bottom case). In scenario (c), our method detects the absence of clear overtaking opportunities and continues fluid-following until the target becomes visible again, ensuring smooth motion (purple solid line) and maintaining a close robot-target distance (purple dashed line) in (d). Panel (d) describes robot-target distance (dashed lines) and velocity changes (solid lines)

IV-C2 Verification of two field-based strategies

We evaluated the effectiveness of the overtaking and fluid-following fields through an ablation study, with outcomes shown in Table II. In Dyna1, removing the fluid-following field slightly decreased the SR by 2%, whereas removing the observation-based overtaking field caused a drop of 70% in SR. Without overtaking, the robot followed a slow-moving occluder, increasing the target distance and failing to re-identify the target when it turned a corner (Fig. 8(a)). In contrast, the overtaking field enabled the robot to overtake the occluder, maintaining a stable and close distance to the target and successfully completing the search (Fig. 8(b)).

In Dyna2, where the occluder’s speed decreases from 1.5 m/s to 0.8 m/s, the absence of either the observation-based overtaking field or the fluid-following field resulted in reduced performance. Specifically, SR and SPL decreased by 76% and 14%, respectively, without the overtaking field, and by 72.4% and 13.3%, respectively, without the fluid-following field (Table II). Simply following the occluder in this scenario increases the robot-target distance, risking the loss of the target (see Fig. 9(a)). Conversely, direct overtaking can lead to unstable motion or collisions, especially when pursuing a fast-moving occluder (see Fig. 9(b)). Our complete method addresses these issues by overtaking only when clear opportunities arise, such as when the overtaking cost is low and the occluder is moving slowly. This approach ensures smooth trajectories and stable motion, as demonstrated in Figs. 9(c) and  9(d).

In Dyna3, involving multiple occluders, the absence of the fluid-following field led to frequent and unstable overtaking attempts, resulting in unstable motion and frequent velocity changes (see Fig. 10(b)). In contrast, our complete method employs a fluid-following strategy when overtaking is not feasible, ensuring smooth navigation and successful recovery once the target reappears after multi-person occlusion (refer to Figs. 10(c) and 10(d)). Overall, the overtaking field is essential for maintaining target tracking, while the fluid-following field ensures safe and stable navigation when overtaking is not possible. The combination of both fields significantly enhances person-search ability under dynamic occlusions.

TABLE III: Person-search performance was evaluated in our self-built three scenarios to independently assess search capabilities under topographic occlusion (Room1, Room2, and Room3). Each scenario was run 100 times. The best result is indicated in bold, while the second-best result is underscored.
Method Room1 Room2 Room3
SR SPL SR SPL SR SPL
GTLL[33] 8 8.0 94 79.6 1 1.0
GTPL[12] 78 73.4 62 53.3 45 41.8
GTLL[33] + Greedy-NBV 77 74.1 94 80.5 65 57.9
GTPL[12] + Greedy-NBV 94 84.6 79 68.0 42 40.2
Active Graph-SLAM[8] 33 33.0 14 21.6 61 60.8
HB-Particle[4] 2 2.0 14 12.2 0 0.0
RPF-Search (ours) 100 96.1 100 97.1 100 96.2
TABLE IV: Person-search performance in dynamic occlusion scenarios.
Method Dyna1 Dyna2 Dyna3
SR SPL SR SPL SR SPL
GTLL[33] 0 0.0 0 0.0 0 0.0
GTPL[12] 8 8.0 5 4.9 0 0.0
GTLL[33] + Greedy-NBV 51 48.8 79 75.7 10 9.9
GTPL[12] + Greedy-NBV 42 41.3 29 28.8 34 33.3
Active Graph-SLAM[8] 4 4.0 1 1.0 1 1.0
HB-Particle[4] 34 33.2 12 11.5 1 0.9
RPF-Search (ours) 97 92.8 100 95.6 92 88.0
TABLE V: Person search performance in simulated environments with different scales and room structures involving static and dynamic pedestrians.
Method Factory Hospital Bookstore
SR SPL SR SPL SR SPL
GTLL[33] 34 32.6 32 31.0 58 56.0
GTPL[12] 80 73.9 49 46.6 50 48.8
GTLL[33] + Greedy-NBV 61 61.5 84 81.4 88 81.5
GTPL[12] + Greedy-NBV 73 67.6 65 55.9 70 62.9
Active Graph-SLAM[8] 19 18.4 49 48.0 67 66.2
HB-Particle[4] 8 6.9 7 4.3 83 69.7
RPF-Search (ours) 100 94.7 98 94.8 98 95.7

IV-D Topological occlusion scenarios

To independently evaluate the person-search capabilities of both the baseline methods and our approach, we designed three factory scenarios (Room1, Room2, and Room3) focusing on topographic occlusion challenges, including regular paths and U-turns. As shown in Table III, our method achieved the highest SR and SPL in all rooms: 100% SR and 96.1% SPL in Room1, 100% SR and 97.1% SPL in Room2, and 100% SR and 96.2% SPL in Room3. These results outperform the second-best methods by up to +35% in SR and +35.4% in SPL, demonstrating that our approach is more effective in re-identifying targets and enables robots to follow near-optimal paths, reducing unnecessary movements and search time.

Among the baselines, “GTPL + Greedy-NBV” performed well in Room1 with 94% SR and 84.6% SPL by accurately predicting target locations in simpler environments. “GTLL + Greedy-NBV” achieved 94% SR and 80.5% SPL in Room2 since the lost target is usually within the room, allowing the greedy search to effectively locate the person. However, in Room3, limited historical information led “GTPL + Greedy-NBV” to predict trajectories toward the corridor, resulting in a greedy search away from the target room and achieving only 42% SR and 40.2% SPL. Similarly, “GTLL + Greedy-NBV” reached low performance with 65% SR and 57.9% SPL.

In contrast, our method transforms predicted trajectories into probabilistic representations and uses a belief-guided search field that dynamically updates the estimated location by incorporating new environmental data and the target’s past motion cues. This approach allows efficient searching and locating of the person, achieving 100% SR and 96.2% SPL in Room3, significantly outperforming the baseline methods.

IV-E Dynamic Occlusion Scenarios

Dynamic occlusions are common in everyday person-following tasks, especially in environments crowded with people. While robots can often recover tracking after short-term occlusions—since the target person reappears quickly—the greatest challenge arises with long-term occlusions, where other pedestrians obstruct the target for extended periods. As introduced in Fig. 6(b), we designed three dynamic occlusion experiments for simulating realistic long-term occlusion scenarios including Dyna1, Dyna2 and Dyna3. In these scenarios, the target person follows a fixed path into Room1 and is occluded for an extended period in the corridor by other individuals. The optimal search behavior should consider both search efficiency and safe navigation, enabling an intelligent switch between overtaking and following the occluders.

As shown in Table IV, our method achieves superior performance across all dynamic occlusion scenarios. Specifically, we obtained 97% SR and 92.8% SPL in Dyna1, 100% SR and 95.6% SPL in Dyna2, and 92% SR and 88.0% SPL in Dyna3. Existing person-search methods do not explicitly address dynamic occlusion problems and, consequently, perform poorly in these environments. Our method surpasses the second-best baseline by: Dyna1: +46% SR and +44.0% SPL, Dyna2: +21% SR and +19.9% SPL, Dyna3: +58% SR and +54.7% SPL.

IV-F Scenarios with different scales and room structures

The evaluation results of different scales and room structures are shown in Table V, our RPF-Search achieves the highest performance, with 100% SR and 94.7% SPL in Factory, 98% SR and 94.8% SPL in Hospital, and 98% SR and 95.7% SPL in Bookstore. It can be observed that person search in small-scale environments (e.g., Bookstore) is relatively straightforward, with five baselines achieving an SR above 50%, and the second-best method reaching an SR of 88% SR and an SPL of 81.5%. This may be attributed to the fact that the target person is more easily observed as the robot explores the smaller Bookstore environment. In contrast, person search becomes more challenging in larger environments (e.g., Hospital and Factory), where incorrect decisions can lead the robot to move away from the target person. Consequently, only two baselines achieve above 50% SR in Hospital, and three achieve above 50% SR in Factory.

Our method, however, consistently outperforms the second-best approach, with improvements of +20% in terms of SR and +20.8% in terms of SPL in Factory, +14% SR and +13.4% SPL in Hospital, and +10% SR and +14.2% SPL in Bookstore. This demonstrates the effectiveness of our belief-guided field, which leverages newly observed environmental information and the target person’s past motion cues to update the belief of the target person’s location. This approach guides the robot to explore the most probable areas, ultimately leading to a successful re-identification of the target person.

V conclusion

In this paper, we introduce RPF-Search, a novel heuristic-guided, field-based person-search framework designed for robust robot person-following in dynamic and unknown environments. This approach addresses key challenges in human-robot interaction, enabling robots to reliably locate a target person despite frequent occlusions. By constructing an environmental map in real-time and leveraging the motion cues of detected individuals, RPF-Search effectively adapts to both topographic occlusions—utilizing a belief-guided search field—and dynamic occlusions through an observation-based potential field combined with fluid-field dynamics. Experimental results in both simulated and real-world scenarios demonstrate RPF-Search’s significant improvements in search efficiency and success rates compared to traditional person-search methods. This advancement offers a practical and reliable solution for personal assistance and human-robot collaboration. Future work could further enhance these results by incorporating advanced trajectory prediction models and exploring the use of semantic cues.

References

  • [1] Q. Yan, J. Huang, Z. Yang, Y. Hasegawa, and T. Fukuda, “Human-following control of cane-type walking-aid robot within fixed relative posture,” IEEE/ASME Trans. Mechatronics, vol. 27, no. 1, pp. 537–548, 2021.
  • [2] J. Chen and W.-j. Kim, “A human-following mobile robot providing natural and universal interfaces for control with wireless electronic devices,” IEEE/ASME Trans. Mechatronics, vol. 24, no. 5, pp. 2377–2385, 2019.
  • [3] V. Isler, S. Kannan, and S. Khanna, “Randomized pursuit-evasion in a polygonal environment,” IEEE Trans. Robot., vol. 21, no. 5, pp. 875–884, 2005.
  • [4] A. Goldhoorn, A. Garrell, R. Alquézar, and A. Sanfeliu, “Searching and tracking people in urban environments with static and dynamic obstacles,” Robot. and Auton. Syst., vol. 98, pp. 147–157, 2017.
  • [5] A. Bayoumi, P. Karkowski, and M. Bennewitz, “Speeding up person finding using hidden markov models,” Robot. and Auton. Syst., vol. 115, pp. 40–48, 2019.
  • [6] L. Bruckschen, K. Bungert, N. Dengler, and M. Bennewitz, “Predicting human navigation goals based on bayesian inference and activity regions,” Robot. and Auton. Syst., vol. 134, p. 103664, 2020.
  • [7] N. Basilico and F. Amigoni, “Exploration strategies based on multi-criteria decision making for searching environments in rescue operations,” Autonomous Robots, vol. 31, pp. 401–417, 2011.
  • [8] J. A. Placed and J. A. Castellanos, “A general relationship between optimality criteria and connectivity indices for active graph-slam,” IEEE Robot. Autom. Lett., vol. 8, no. 2, pp. 816–823, 2023.
  • [9] F. Niroui, K. Zhang, Z. Kashino, and G. Nejat, “Deep reinforcement learning robot for search and rescue applications: Exploration in unknown cluttered environments,” IEEE Robot. Autom. Lett., vol. 4, no. 2, pp. 610–617, 2019.
  • [10] R. Algabri and M.-T. Choi, “Target recovery for robust deep learning-based person following in mobile robots: Online trajectory prediction,” Applied Sciences, vol. 11, no. 9, p. 4165, 2021.
  • [11] B.-J. Lee, J. Choi, C. Baek, and B.-T. Zhang, “Robust human following by deep bayesian trajectory prediction for home service robots,” in Proc. IEEE Int. Conf. Robot. Autom., 2018, pp. 7189–7195.
  • [12] M. Kim, M. Arduengo, N. Walker, Y. Jiang, J. W. Hart, P. Stone, and L. Sentis, “An architecture for person-following using active target search,” ArXiv, vol. abs/1809.08793, 2018.
  • [13] A. Goldhoorn, A. Garrell, R. Alquézar, and A. Sanfeliu, “Continuous real time pomcp to find-and-follow people by a humanoid service robot,” in IEEE Int. Conf. Humanoid Robots.   IEEE, 2014, pp. 741–747.
  • [14] M. Do Hoang, S.-S. Yun, and J.-S. Choi, “The reliable recovery mechanism for person-following robot in case of missing target,” in IEEE Int. Conf. Ubiquitous Robots and Ambient Intelligence (URAI).   IEEE, 2017, pp. 800–803.
  • [15] P. Henry, C. Vollmer, B. Ferris, and D. Fox, “Learning to navigate through crowded environments,” in Proc. IEEE Int. Conf. Robot. Autom., 2010, pp. 981–986.
  • [16] K. Cai, D. Dugas, R. Siegwart, et al., “Sampling-based path planning in highly dynamic and crowded pedestrian flow,” IEEE Trans. Intell. Transp. Syst., vol. 24, no. 12, pp. 14 732–14 742, 2023.
  • [17] D. Dugas, K. Cai, O. Andersson, N. Lawrance, R. Siegwart, and J. J. Chung, “Flowbot: Flow-based modeling for robot navigation,” in Proc. IEEE/RJS Int. Conf. Intell. Robots Syst., 2022, pp. 8799–8805.
  • [18] J. Saraydaryan, F. Jumel, and O. Simonin, “Navigation in human flows: planning with adaptive motion grid,” in Proc. IEEE/RJS Int. Conf. Intell. Robots Syst. (IROS) Workshop CrowdNav, 2018.
  • [19] S. S. Lodhi, N. Kumar, and P. K. Pandey, “Autonomous vehicular overtaking maneuver: A survey and taxonomy,” Vehicular Communications, vol. 42, p. 100623, 2023.
  • [20] X. Chen, J. Wei, X. Ren, K. H. Johansson, and X. Wang, “Automatic overtaking on two-way roads with vehicle interactions based on proximal policy optimization,” in 2021 IEEE Intelligent Vehicles Symposium (IV).   IEEE, 2021, pp. 1057–1064.
  • [21] W. Schwarting, J. Alonso-Mora, L. Pauli, S. Karaman, and D. Rus, “Parallel autonomy in automated vehicles: Safe motion generation with minimal intervention,” in Proc. IEEE Int. Conf. Robot. Autom.   IEEE, 2017, pp. 1928–1935.
  • [22] Q. Ma, M. Li, G. Huang, et al., “Overtaking path planning for cav based on improved artificial potential field,” IEEE Trans. Veh. Technol., 2023.
  • [23] H. Ye, J. Zhao, Y. Zhan, W. Chen, L. He, and H. Zhang, “Person re-identification for robot person following with online continual learning,” IEEE Robot. Autom. Lett., pp. 1–8, 2024.
  • [24] A. Hornung, K. M. Wurm, M. Bennewitz, C. Stachniss, and W. Burgard, “OctoMap: An efficient probabilistic 3D mapping framework based on octrees,” Autonomous Robots, 2013.
  • [25] K. Cai, W. Chen, C. Wang, S. Song, and M. Q.-H. Meng, “Human-aware path planning with improved virtual doppler method in highly dynamic environments,” IEEE Trans. Autom. Sci. Eng., vol. 20, no. 2, pp. 1304–1321, 2022.
  • [26] C. Fulgenzi, A. Spalanzani, C. Laugier, and C. Tay, “Risk based motion planning and navigation in uncertain dynamic environment,” Inria,” Research Report, Oct. 2010.
  • [27] K. Cai, R. Laha, Y. Gong, L. Chen, L. Zhang, L. F. Figueredo, and S. Haddadin, “Demonstration to adaptation: A user-guided framework for sequential and real-time planning,” Proc. IEEE/RJS Int. Conf. Intell. Robots Syst., p. 8, 2024.
  • [28] K. Koide and J. Miura, “Identification of a specific person using color, height, and gait features for a person following robot,” Robot. and Auton. Syst., vol. 84, pp. 76–87, 2016.
  • [29] H. Ye, J. Zhao, Y. Pan, W. Chen, L. He, and H. Zhang, “Robot person following under partial occlusion,” in Proc. IEEE Int. Conf. Robot. Autom., 2023, pp. 7591–7597.
  • [30] N. M. Korobov, “Number-theoretic methods in approximate analysis,” 1963.
  • [31] D. Dugas, K. Cai, O. Andersson, N. Lawrance, R. Siegwart, and J. J. Chung, “Flowbot: Flow-based modeling for robot navigation,” in Proc. IEEE/RJS Int. Conf. Intell. Robots Syst., 2022, pp. 8799–8805.
  • [32] D. Helbing and P. Molnar, “Social force model for pedestrian dynamics,” Physical review E, vol. 51, no. 5, p. 4282, 1995.
  • [33] B. X. Chen, R. Sahdev, and J. K. Tsotsos, “Integrating stereo vision with a cnn tracker for a person-following robot,” in Int. Conf. Comput. Vis. Syst. (ICVS).   Springer, 2017, pp. 300–313.
  • [34] P. Anderson, A. Chang, et al., “On evaluation of embodied navigation agents,” arXiv preprint arXiv:1807.06757, 2018.
[Uncaptioned image] Hanjing Ye received his B.E. and M.S. degrees from Guangdong University of Technology in 2019 and 2022, respectively. He is currently pursuing his Ph.D. degree in Robotics at the Southern University of Science and Technology. His research interests involve robot person following, autonomous navigation and human-robot interaction.
[Uncaptioned image] Kuanqi Cai is currently a Research Fellow at the HRII Lab, Italian Institute of Technology (IIT). Prior to joining IIT, he was a Research Associate at the Technical University of Munich, from 2023 to 2024. In 2021, he was honored as a Robotics Student Fellow at ETH Zurich. He obtained his B.E. degree from Hainan University in 2018 and his M.E. degree from Harbin Institute of Technology in 2021. His current research interests focus on motion planning and human-robot interaction.
[Uncaptioned image] Yu Zhan received his B.E. degree in microelectronics from Wuhan University in 2022. He is currently pursuing his Master’s degree from the Southern University of Science and Technology. His research interests include human pose estimation, human-robot interaction and robot vision.
[Uncaptioned image] Bingyi Xia received his B.E. and M.S. degrees from the Southern University of Science and Technology in 2020 and 2023, respectively. He is currently pursuing his Ph.D. degree with the Department of Electronic and Electrical Engineering at Southern University of Science and Technology, Shenzhen, China. His research interests include robot motion planning.
[Uncaptioned image] Arash Ajoudani is the director of the Human-Robot Interfaces and Interaction (HRI²) laboratory at IIT and a recipient of the ERC grants Ergo-Lean (2019) and Real-Move (2023). He has coordinated multiple Horizon 2020 and Horizon Europe projects and received numerous awards, including the IEEE Robotics and Automation Society (RAS) Early Career Award (2021) and the KUKA Innovation Award (2018). A Senior Editor of IJRR and a member of the IEEE RAS Administrative Committee (2022-2024), his research focuses on physical human-robot interaction, mobile manipulation, and adaptive control.
[Uncaptioned image] Hong Zhang received his PhD in Electrical Engineering from Purdue University in 1986. He was a Professor in the Department of Computing Science, University of Alberta, Canada for over 30 years before he joined the Southern University of Science and Technology (SUSTech), China, in 2020, where he is currently a Chair Professor. Dr. Zhang served as the Editor-in-Chief of the IROS Conference Paper Review Board (2020-2022) and is currently a member of the IEEE Robotics and Automation Society Administrative Committee (2023-25). He is a Fellow of IEEE and a Fellow of the Canadian Academy of Engineering. His research interests include robotics, computer vision, and image processing.

APPENDIX

V-A Methodology Supplements

We additionally provide the method flows of our belief-guided search field and observation-based search field as shown in Algorithm. 1 and Algorithm. 2, respectively.

Input: Occupancy map _t\mathcal{M}_{\_}t, Target person’s historical trajectory _t\mathcal{H}_{\_}t, Robot position in the map frame 𝐩_r,t\mathbf{p}_{\_}{r,t}
Output: Optimal search goal 𝐱\mathbf{x}^{*}
1
2{_i}_t=FrontierDetection(_t)\{\mathcal{F}_{\_}i\}_{\_}t=FrontierDetection(\mathcal{M}_{\_}t);
3 # For probabilistic inheritance
4 Parent(_i,t)=FindParent(_i,t),Parent(_i,t){_i}_t1Parent(\mathcal{F}_{\_}{i,t})=FindParent(\mathcal{F}_{\_}{i,t}),Parent(\mathcal{F}_{\_}{i,t})\in\{\mathcal{F}_{\_}i\}_{\_}{t-1};
5
6# For target person existence
7 {𝐮¯_i}_Ni=1=TrajectoryPrediction(_t)\{\bar{\mathbf{u}}_{\_}{i}\}^{N}_{\_}{i=1}=TrajectoryPrediction(\mathcal{H}_{\_}t);
8 𝒱_e()=ProbPropagation({𝐮¯_i}_Ni=1)\mathcal{V}_{\_}e(\cdot)=ProbPropagation(\{\bar{\mathbf{u}}_{\_}{i}\}^{N}_{\_}{i=1});
9
10# Candidate weight calculation
11 for _i,t{_i}_t\mathcal{F}_{\_}{i,t}\in\{\mathcal{F}_{\_}i\}_{\_}t do
12       Inference factor η_i=Simulation(_t[1],_i,t)\eta_{\_}i=Simulation(\mathcal{H}_{\_}t[-1],\mathcal{F}_{\_}{i,t}) \longrightarrow Eq. 3;
13       Information gain b_g=𝒱_g(_i,t,_t)b_{\_}g=\mathcal{V}_{\_}g(\mathcal{F}_{\_}{i,t},\mathcal{M}_{\_}t) \longrightarrow Eq. 4;
14       Collision risk b_c=𝒱_c(_i,t,_t)b_{\_}c=\mathcal{V}_{\_}c(\mathcal{F}_{\_}{i,t},\mathcal{M}_{\_}t) \longrightarrow Eq. 5;
15       Target person existence b_e=𝒱_e(_i,t)b_{\_}e=\mathcal{V}_{\_}e(\mathcal{F}_{\_}{i,t}) \longrightarrow Eq. 8;
16       Probabilistic inheritance b_p=𝒱_p(_i,t,Parent(_i,t))b_{\_}p=\mathcal{V}_{\_}p(\mathcal{F}_{\_}{i,t},Parent(\mathcal{F}_{\_}{i,t})) \longrightarrow Eq. 10;
17       W_i,t=η_i(b_c+b_g+b_e+b_p)W_{\_}{i,t}=\eta_{\_}i(b_{\_}c+b_{\_}g+b_{\_}e+b_{\_}p) \longrightarrow Eq. 2;
18      
19𝐱=argmax_𝐱{W_iexp(𝐱_i22σ2)}{\mathbf{x}^{*}}=\arg\max_{\_}{\mathbf{x}}\left\{W_{\_}i\exp\left(-\frac{\|\mathbf{x}-\mathcal{F}_{\_}i\|^{2}}{2\sigma^{2}}\right)\right\} \longrightarrow Eq. 1;
20
Return 𝐱\mathbf{x}^{*}
Algorithm 1 Belief-guided search field
Input: Occupancy map _t\mathcal{M}_{\_}t, Occluders historical trajectory {_occ,i}_t\{\mathcal{H}_{\_}{{\rm occ},i}\}_{\_}t, Robot position in the map frame 𝐩_r,t\mathbf{p}_{\_}{r,t}, Robot velocity v_robotv_{\_}{\rm robot}, ‘Viscosity’ of the fluid flow γ\gamma, Velocity thresholds δ_v_min\delta_{\_}{v_{\_}{\text{min}}} and δ_v_max\delta_{\_}{v_{\_}{\text{max}}}, Cost threshold δ_c\delta_{\_}c
Output: Optimal search goal 𝐱\mathbf{x}^{*}
1
2{𝐱_i}=Sample(_t,{_occ,i}_t)\{\mathbf{x}_{\_}i\}=Sample(\mathcal{M}_{\_}t,\{\mathcal{H}_{\_}{{\rm occ},i}\}_{\_}t);
3 Occluders’ velocity {v_occ,i}_t=CalVelocity({_occ,i}_t)\{v_{\_}{{\rm occ},i}\}_{\_}t=CalVelocity(\{\mathcal{H}_{\_}{{\rm occ},i}\}_{\_}t);
4 # Observation-based potential field overtaking
5 {𝐩¯_i,j}_Nj=1=TrajectoryPrediction({_occ,i}_t)\{\bar{\mathbf{p}}_{\_}{i,j}\}^{N}_{\_}{j=1}=TrajectoryPrediction(\{\mathcal{H}_{\_}{{\rm occ},i}\}_{\_}t);
6 𝐅_rep()=CalRepulsiveForce({𝐩¯_i,j}_Nj=1,_t)\mathbf{F}_{\_}{\text{rep}}(\cdot)=CalRepulsiveForce(\{\bar{\mathbf{p}}_{\_}{i,j}\}^{N}_{\_}{j=1},\mathcal{M}_{\_}t);
7 𝐅_att()=CalAttractiveForce({𝐩¯_i,j}_Nj=1,𝐩_r,t)\mathbf{F}_{\_}{\text{att}}(\cdot)=CalAttractiveForce(\{\bar{\mathbf{p}}_{\_}{i,j}\}^{N}_{\_}{j=1},\mathbf{p}_{\_}{r,t});
8 (𝐱,C)=(argmin_{𝐱_i},min)(𝐅_att(𝐱_i)+𝐅_rep(𝐱_i))\left(\mathbf{x}^{*},C^{*}\right)=\left(\mathop{\arg\min}_{\_}{\{\mathbf{x}_{\_}i\}},\min\right)\left(\|\mathbf{F}_{\_}{\text{att}}(\mathbf{x}_{\_}i)\|+\|\mathbf{F}_{\_}{\text{rep}}(\mathbf{x}_{\_}i)\|\right);
9
10if max({v_occ,i}_t)<δ_v_min\max(\{v_{\_}{{\rm occ},i}\}_{\_}t)<\delta_{\_}{v_{\_}{\text{min}}} and C>δ_cC^{*}>\delta_{\_}c then
11       # Switch to belief-guided searching
12       Get 𝐱\mathbf{x}^{*} from Algorithm 1;
13      
14 else if min({v_occ,i}_t)>δ_v_max\min(\{v_{\_}{{\rm occ},i}\}_{\_}t)>\delta_{\_}{v_{\_}{\text{max}}} or C>δ_cC^{*}>\delta_{\_}c then
15       # Fluid field-based following
16       v()=CalVelocityField({v_occ,i}_t,{_occ,i}_t)\mathrm{v}(\cdot)=CalVelocityField(\{v_{\_}{{\rm occ},i}\}_{\_}t,\{\mathcal{H}_{\_}{{\rm occ},i}\}_{\_}t);
17       ρ_()=CalDensityField({_occ,i}_t)\rho_{\_}(\cdot)=CalDensityField(\{\mathcal{H}_{\_}{{\rm occ},i}\}_{\_}t);
18       𝐱=argmin_{𝐱_i}(γρ(𝐱_i)(v_robotv(𝐱_i)))\mathbf{x}^{*}=\mathop{\arg\min}_{\_}{\{\mathbf{x}_{\_}i\}}(\gamma\rho(\mathbf{x}_{\_}i)\cdot(\mathrm{v}_{\_}{\text{robot}}-\mathrm{v}(\mathbf{x}_{\_}i)));
19      
Return 𝐱\mathbf{x}^{*}
Algorithm 2 Observation-based search field

V-B Implementation Details and Metrics

Simulated Environment Factory Hospital Bookstore
Room1 Room2 Room3 Dyna1 Dyna2 Dyna3 Factory
Size (m×\timesm) 40×\times70 26×\times50 14×\times13
Num. of people 1 1 1 9 9 11 20 20 9
Target’s path length (m) 40.0 55.0 50.0 57.0 57.0 57.0 48.3 28.6 17.9
Robot Kobuki differential robot with maximum 2.0 m/s
Sensors RGB-D camera (120° FoV, 0.03-10 m) and 2D laser rangefinder (0-20 m)
Real-World Environment Large U-turn Long Hallway (S) Long Hallway (M)
Size (m×\timesm) 20×\times3 50×\times3 50×\times3
Num. of people 1 2 4
Target’s path length (m) 15 42 42
Robot Scout-Mini differential robot with maximum 1.8 m/s
Sensors Stereo camera (120° FoV, 0.03-10 m)
TABLE VI: Parameters of simulated and real-world environments.
Parameter v_maxv_{\_}{\text{max}} λ\lambda d_sd_{\_}s d_0d_{\_}0 η_f\eta_{\_}{f} η_a\eta_{\_}{a} δ_c\delta_{\_}{c} δ_v_min\delta_{\_}{v_{\_}{\text{min}}}
Value 1.5 m/s 0.01 1.5 m 1.3 m 1.1 0.9 10.0 1.3 m/s
TABLE VII: Parameters of the proposed method.

The parameters of the proposed method are shown in Table. VII. In the simulated environments, our differential-drive robot is equipped with a 2D laser rangefinder (0-20 m) for navigation and an RGB-D camera (120° FoV, 0.03-10 m) for perception. The robot’s maximum velocity is 2.0 m/s, and the simulated people’s speeds range from 0.8 to 1.7 m/s. In our real-world experiments, we employ a Scout-Mini differential-drive robot equipped with a stereo camera that offers a 120° horizontal field of view (FoV) and a range of 0.03–10 meters. The stereo camera provides both RGB and depth images. To maintain consistency with our simulations, all planners and controllers are configured identically. For target person tracking and re-identification, we adopt the vision-based tracking method proposed in [23]. All parameters involving simulated and real-world environments are shown in Table VI.

We additionally evaluate the following metric: Euclidean Distance to Target Over Time (EDTOT): An effective person-search approach should help the robot quickly recover the person-following, resulting in a smaller average distance to the target person, so we measure the Euclidean distance between the robot and the target person in each episode.

V-C Real-world Experiments

Refer to caption
Figure 11: Real-world Experimental results. Each method was executed ten times for each scenario. ’S’ denotes single-person occlusion, while ’M’ indicates multiple-person occlusion.
Refer to caption
Figure 12: Keyframes from our method during real-world experiments involving single-person occlusion in overtaking scenarios. When the target person is occluded by a pedestrian, an observation-based overtaking field is constructed. Overtaking candidates (represented by yellow points) are sampled and evaluated within this field. The candidate with the minimum cost, provided that the cost is below a predefined threshold, is selected as the next search point. Through continuous selection of search points and robot navigation, the robot smoothly performs the overtaking maneuver and successfully re-finds the target person.
Refer to caption
Figure 13: Keyframes from our method during real-world experiments involving multi-person occlusion in a narrow corridor. When the target person is occluded due to dynamic obstructions, an overtaking field is first constructed. However, because the minimum overtaking cost exceeds a predefined threshold, a fluid field based on the motion cues of visible occluders is created to ensure safe navigation and facilitate target searching. The robot follows the fluid field (shown in the second column) until an overtaking opportunity arises (depicted in the third column), enabling the robot to re-find the target person.

The real-world experiments were conducted in an indoor office environment featuring a long hallway and a large U-turn, specifically designed to test the performance of the algorithm under dynamic and topographic occlusions. In the topographic occlusion experiment (Fig. 1(a)), our proposed algorithm demonstrated significant advantages over GTLL[33] + Greedy-NBV and GTPL[12] + Greedy-NBV, which were the best-performing algorithms in previous comparisons. Notably, our approach achieved perfect tracking recovery, with ten successful runs out of ten — eight more than GTLL and seven more than GTPL, as shown in Fig. 11. This improvement is due to the limitations of GTLL + Greedy-NBV and GTPL + Greedy-NBV, as their reliance on local greedy search often results in suboptimal paths, such as frequent detours or overly conservative trajectories. In contrast, our algorithm leverages a probabilistic distribution to optimize the search process, reducing unnecessary exploration and producing more efficient and natural navigation paths.

In the dynamic occlusion experiments (Figs 1(b) and 1(c)), our algorithm consistently outperformed the compared methods. Unlike existing approaches, which struggle with dynamic occlusions, our algorithm integrates adaptive strategies—such as fluid-following and overtaking—that effectively address varying occlusion scenarios in real-world environments. These adaptive strategies significantly improve the robot’s ability to re-find the target person after occlusions, resulting in a higher success rate. Specifically, our method achieved 7 more successes in Long Hallway (S) and 8 more successes in Long Hallway (M) compared to the version without adaptive strategies, as shown in the experimental data (Fig.11).

Visualizations of the single-person and multi-person occlusion experiments are provided in Fig.12 and Fig.13, respectively. In the single-person occlusion scenario, an observation-based overtaking field is constructed using the motion cues of the occluder. Within this field, overtaking candidates (indicated by yellow points in the figure) are sampled and evaluated. The candidate with the minimum cost, as long as it is below a predefined threshold, is selected as the next search point. Through continuous selection of search points and navigation, the robot smoothly overtakes the occluder and successfully re-finds the target person.

In the multi-person occlusion scenario, our method first constructs the overtaking field. However, because the minimum overtaking cost exceeds a predefined threshold, a fluid field is created using the motion cues of visible occluders to ensure safe navigation and facilitate target searching. The robot follows the fluid field (shown in the second column) until an overtaking opportunity arises (shown in the third column), allowing the robot to resume tracking the target.

Refer to caption
(a) Room1
Refer to caption
(b) Room2
Refer to caption
(c) Room3
Refer to caption
(d) Dyna1
Refer to caption
(e) Dyna2
Refer to caption
(f) Dyna3
Figure 14: Euclidean distance between the robot and the target person over time. When the target person was lost, the fluctuations in the distance using our method were smaller than those in other baselines, indicating that our approach could quickly navigate to a true direction to search for the target. Additionally, compared to the baselines, our method converges to the person-following distance more rapidly. The average distance, calculated over 100 runs, is shown in the legend. Our method consistently achieves the best performance, with average distances of 3.7 m in Room1, 3.4 m in Room2, 6.4 m in Room3, 5.2 m in Dyna1, 5.0 m in Dyna2, and 5.5 m in Dyna3.

V-D Simulated Experiments

For topographic occlusion, as observed in Fig. 14(a), Fig. 14(b) and Fig. 14(c), when using our RPF-Search method, the distance between the robot and the person quickly converges to the desired following distance after tracking loss. Consequently, as indicated in the legends of these figures, our method achieves the smallest average Euclidean distance to the target, with 3.7 m in Room1, 3.4 m in Room2 and 6.4 m in Room3, surpassing the second-best baseline by reductions of 1.8 m, 0.8 m and 3.0 m, respectively. This demonstrates that our method can help the robot quickly re-find the target person and return to the person-following stage.

Moreover, we provide a person-search visualization of baselines and our method in Room3 as shown in Fig. 15 where the target person’s route involves making a U-turn corner and entering a room. The visualizations of our method are provided in the last two rows of Fig. 15. Initially, the trajectory prediction (depicted as a red line in the fourth row) directs the robot toward the corridor. However, our method effectively utilizes newly observed environmental information to penalize infeasible candidates through the inference factor (Eq. 3). This factor dictates that if the target person were indeed moving toward the corridor, the robot should have observed them by now. Consequently, as shown in the second belief-guided graph, the belief associated with the corridor candidate reduces to zero; specifically, the belief of ’Cand. 2’ is zero in the second graph. This prompts the robot to redirect its search toward higher-belief candidates within the room, i.e., ’Cand. 3,’ which holds a belief value of 0.022 in the second graph. Without this inference factor, the robot would pursue the target person into the corridor, following the trajectory prediction and ultimately resulting in a failed search.

Additionally, the probabilistic inheritance capability of the belief-guided field (Eq. 10) allows newly observed candidates to inherit belief from parents in a belief-decaying manner. This mechanism enables the robot to continuously adjust its focus toward more probable candidates until locating the target, as shown in the third and fourth graphs of Fig. 15. Due to probabilistic inheritance, as seen in the third graph, ’Cand. 4’ inherits belief from its predecessor in a decayed form, resulting in a belief of 0.018, which guides the robot to continue searching the room until the target person is found. If probabilistic inheritance were removed, the belief of ’Cand. 4’ would drop to zero due to its distance from the probability area defined by the trajectory prediction, causing the robot to navigate incorrectly and leading to a failed search. Our proposed probabilistic inheritance utilizes the local connectivity of newly produced and old frontiers, enabling an optimal selection of inside-room and outside-room explorations.

For dynamic occlusion, EDTOT results shown in Fig. 14(d), Fig. 14(e) and Fig. 14(f) also evidence the effectiveness of our method compared to other baselines. Our method can achieve the lowest average distance with 5.2 m in Dyna1 and 5.0 m in Dyna2 and 5.5 m in Dyna3, surpassing the second-best baseline by reductions of 4.8 m, 2.3 m and 8.8 m, respectively. These significant improvements confirm the effectiveness of our RPF-Search method in handling dynamic occlusions. By incorporating our proposed fluid-following and overtaking fields, the robot can perform safe and smooth search behaviors when dealing with dynamic occlusions.

Refer to caption
Figure 15: Searching keyframes of baselines and our method in topographic occlusion scenarios in Room3. The first row shows the explored occupancy map, frontiers (red squares), robot and target positions, ego view, and trajectories (orange and green lines). The second row presents the belief-guided graph, including robot and target trajectories, candidate frontiers with belief scores (e.g., “Cand. 4: 0.028”), and additional frontiers for probabilistic inheritance (grey squares). Initially, trajectory predictions direct the robot toward the corridor (red line), but our belief-guided graph updating integrates new environmental data and the target’s motion history to penalize unlikely corridor candidates and focus the search on more promising areas within the room.

V-E Limitations and Discussion

Our method demonstrates promising performance for person search in unknown and dynamic environments, but it is not yet an optimal solution. For instance, the current approach relies on a naive Intersection over Union (IoU)-based assessment to trigger the two fields. However, in more complex environments where the target person might transition from dynamic occlusion to topographic occlusion, this simplistic approach can mislead the robot. To address this, a more sophisticated occlusion identification method could be explored. One potential solution is leveraging a large language model or a visual-language model to engage in chain-of-thought reasoning, enabling the system to infer the target person’s occlusion state by analyzing occluders’ motion cues and the surrounding environment.

Furthermore, our method faces challenges in handling long-term disappearances caused by topographic occlusion. Currently, the exclusion of impossible candidates relies heavily on an inference factor that is strongly tied to the duration of the disappearance. When the target is lost for an extended period, this factor becomes uniform across all candidates (frontiers), forcing the method to depend solely on trajectory prediction and probability inheritance. This limitation can result in prolonged search times. To mitigate this, incorporating semantic cues could improve the exclusion process. For example, if prior knowledge indicates that the target person is looking for a drink with an empty bottle, locations such as meeting rooms or gyms would be less likely to contain the person, allowing for more focused and efficient searching.

Additionally, our method can be seamlessly integrated with map-dependent person-search approaches. By dynamically constructing occupancy maps and frontier-based topological graphs during the RPF process, it naturally generates a topo-metric map. The frontiers, serving as nodes, represent safe navigation points and provide maximal observational value. Moreover, during RPF, if the robot accompanies a single user in the same location for an extended period, these nodes can encode the user’s appearance frequency, enabling a more efficient and targeted person search.