Scalable Multi-Robot Informative Path Planning for Target Mapping via Deep Reinforcement Learning

Apoorva Vashisth · Manav Kulshrestha · Damon Conover · Aniket Bera

Abstract

Autonomous robots are widely utilized for mapping and exploration tasks due to their cost-effectiveness. Multi-robot systems offer scalability and efficiency, especially in terms of the number of robots deployed in more complex environments. These tasks belong to the set of Multi-Robot Informative Path Planning (MRIPP) problems. In this paper, we propose a deep reinforcement learning approach for the MRIPP problem. We aim to maximize the number of discovered stationary targets in an unknown 3D environment while operating under resource constraints (such as path length). Here, each robot aims to maximize discovered targets, avoid unknown static obstacles, and prevent inter-robot collisions while operating under communication and resource constraints. We utilize the centralized training and decentralized execution paradigm to train a single policy neural network. A key aspect of our approach is our coordination graph that prioritizes visiting regions not yet explored by other robots. Our learned policy can be copied onto any number of robots for deployment in more complex environments not seen during training. Our approach outperforms state-of-the-art approaches by at least 26.2% in terms of the number of discovered targets while requiring a planning time of less than 2 sec per step. We present results for more complex environments with up to 64 robots and compare success rates against baseline planners.

Overview

Overview of our approach

Overview of our deep reinforcement learning approach for the multi-robot informative path planning problem. At each time-step, our approach samples collision-free candidate actions in the robot's local region. Our coordination graph associates each candidate action with a utility value, the uncertainty of the utility value, and the exploration features modeling the regions visited by other robots. Our policy network relies on these features to output the robot's state value and the next action to execute, leading to the generation of reward and observations from the environment. Here, the black arrows indicate the robot control loop, green arrows and green boxes are the variables stored in the experience buffer for on-policy training of our policy network.

Results

Experimental results for our approach

Comparison of our approach with other baselines in an urban environment. Our performance metric is the percentage of targets discovered during the episode. The solid lines represent the mean values across 250 trials, while the shaded areas denote the standard deviations.

Cite

@article{vashisth2026scalable,
  title     = {Scalable Multi-Robot Informative Path Planning for Target Mapping via Deep Reinforcement Learning},
  author    = {Vashisth, Apoorva and
               Kulshrestha, Manav and
               Conover, Damon and
               Bera, Aniket},
  journal   = {IEEE Robotics and Automation Letters},
  year      = {2026},
  publisher = {IEEE}
}