Cross-Entropy Regularized Policy Gradient for Multirobot Nonadversarial Moving Target Search

18Citations
Citations of this article
15Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

This article investigates the multirobot efficient search (MuRES) for a nonadversarial moving target problem from the multiagent reinforcement learning (MARL) perspective. MARL is deemed as a promising research field for cooperative multiagent applications. However, one of the main bottlenecks of applying MARL to the MuRES problem is the nonstationarity introduced by multiple learning agents. With learning agents simultaneously updating their policies, the environment cannot be modeled as a stationary Markov decision process, which results in the inapplicability of fundamental reinforcement learning techniques such as deep Q-network and policy gradient (PG). In view of that, we adopt the centralized training and decentralized execution scheme and thereby propose a cross-entropy regularized policy gradient (CE-PG) method to train the learning agents/robots. We let the robots commit to a predetermined policy during execution, collect the trajectories, and then perform centralized training for the corresponding policy improvement. In this way, the nonstationarity problem is overcome, in that the robots do not update their policies during execution. During the centralized training stage, we improve the canonical PG method to consider the interactions among robots by adding a cross-entropy regularization term, which essentially functions to 'disperse' the robots in the environment. Extensive simulation results and comparisons with state of the art show CE-PG's superior performance, and we also validate the algorithm with a real multirobot system in an indoor moving target search scenario.

Cite

CITATION STYLE

APA

Guo, H., Liu, Z., Shi, R., Yau, W. Y., & Rus, D. (2023). Cross-Entropy Regularized Policy Gradient for Multirobot Nonadversarial Moving Target Search. IEEE Transactions on Robotics, 39(4), 2569–2584. https://doi.org/10.1109/TRO.2023.3263459

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free