The Guiding Role of Reward Based on Phased Goal in Reinforcement Learning

2Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Sparse and delayed rewards have greatly hindered the deep reinforcement learning, which is supposed to acquire the optimal policy by learning from trajectories. Reward shaping, which has previously been introduced to accelerate learning, is one of the most effective methods to tackle this crucial yet challenging problem. However, how to reasonably implement reward shaping needs to be explored. Currently, the method of reward shaping usually requires a large number of expert demonstrations, and the environment is poorly explored. In this paper, we proposed a method of reward shaping - -Reinforcement learning framework based on phased goal, which will accelerate learning convergence speed with less expert examples and explore better especially for tasks where environment rewards are particularly sparse. The framework consists of reward based on phased goal and policy learning using PPO2. The process of acquiring designed reward is divided into stage classification and calculation of goal proximity. Experiments proved that our method can effectively alleviate the problem of sparse reward and obtain higher scores in Atari game than basic algorithm.

Cite

CITATION STYLE

APA

Liu, Y., & Hu, Z. (2020). The Guiding Role of Reward Based on Phased Goal in Reinforcement Learning. In ACM International Conference Proceeding Series (pp. 535–541). Association for Computing Machinery. https://doi.org/10.1145/3383972.3384039

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free