Cross-Entropy Regularized Policy Gradient for Multirobot Nonadversarial Moving Target Search

Hongliang Guo; Zhaokai Liu; Rui Shi; Wei Yun Yau; Daniela Rus

Journal ArticleOPEN ACCESS

Cross-Entropy Regularized Policy Gradient for Multirobot Nonadversarial Moving Target Search

IEEE Transactions on Robotics (2023) 39(4) 2569-2584

DOI: 10.1109/TRO.2023.3263459

18Citations

15Readers

Abstract

This article investigates the multirobot efficient search (MuRES) for a nonadversarial moving target problem from the multiagent reinforcement learning (MARL) perspective. MARL is deemed as a promising research field for cooperative multiagent applications. However, one of the main bottlenecks of applying MARL to the MuRES problem is the nonstationarity introduced by multiple learning agents. With learning agents simultaneously updating their policies, the environment cannot be modeled as a stationary Markov decision process, which results in the inapplicability of fundamental reinforcement learning techniques such as deep Q-network and policy gradient (PG). In view of that, we adopt the centralized training and decentralized execution scheme and thereby propose a cross-entropy regularized policy gradient (CE-PG) method to train the learning agents/robots. We let the robots commit to a predetermined policy during execution, collect the trajectories, and then perform centralized training for the corresponding policy improvement. In this way, the nonstationarity problem is overcome, in that the robots do not update their policies during execution. During the centralized training stage, we improve the canonical PG method to consider the interactions among robots by adding a cross-entropy regularization term, which essentially functions to 'disperse' the robots in the environment. Extensive simulation results and comparisons with state of the art show CE-PG's superior performance, and we also validate the algorithm with a real multirobot system in an indoor moving target search scenario.

Author supplied keywords

Cite

CITATION STYLE

APA

Guo, H., Liu, Z., Shi, R., Yau, W. Y., & Rus, D. (2023). Cross-Entropy Regularized Policy Gradient for Multirobot Nonadversarial Moving Target Search. IEEE Transactions on Robotics, 39(4), 2569–2584. https://doi.org/10.1109/TRO.2023.3263459

Cross-Entropy Regularized Policy Gradient for Multirobot Nonadversarial Moving Target Search

Abstract

Author supplied keywords

Cite

Register to see more suggestions