Diversity Evolutionary Policy Deep Reinforcement Learning

Jian Liu; Liming Feng

Journal ArticleOPEN ACCESS

Diversity Evolutionary Policy Deep Reinforcement Learning

Computational Intelligence and Neuroscience (2021) 2021

DOI: 10.1155/2021/5300189

9Citations

8Readers

Abstract

The reinforcement learning algorithms based on policy gradient may fall into local optimal due to gradient disappearance during the update process, which in turn affects the exploration ability of the reinforcement learning agent. In order to solve the above problem, in this paper, the cross-entropy method (CEM) in evolution policy, maximum mean difference (MMD), and twin delayed deep deterministic policy gradient algorithm (TD3) are combined to propose a diversity evolutionary policy deep reinforcement learning (DEPRL) algorithm. By using the maximum mean discrepancy as a measure of the distance between different policies, some of the policies in the population maximize the distance between them and the previous generation of policies while maximizing the cumulative return during the gradient update. Furthermore, combining the cumulative returns and the distance between policies as the fitness of the population encourages more diversity in the offspring policies, which in turn can reduce the risk of falling into local optimal due to the disappearance of the gradient. The results in the MuJoCo test environment show that DEPRL has achieved excellent performance on continuous control tasks; especially in the Ant-v2 environment, the return of DEPRL ultimately achieved a nearly 20% improvement compared to TD3.

Cite

CITATION STYLE

APA

Liu, J., & Feng, L. (2021). Diversity Evolutionary Policy Deep Reinforcement Learning. Computational Intelligence and Neuroscience, 2021. https://doi.org/10.1155/2021/5300189

Diversity Evolutionary Policy Deep Reinforcement Learning

Abstract

Cite

Register to see more suggestions