Abstract
Sparse and delayed rewards have greatly hindered the deep reinforcement learning, which is supposed to acquire the optimal policy by learning from trajectories. Reward shaping, which has previously been introduced to accelerate learning, is one of the most effective methods to tackle this crucial yet challenging problem. However, how to reasonably implement reward shaping needs to be explored. Currently, the method of reward shaping usually requires a large number of expert demonstrations, and the environment is poorly explored. In this paper, we proposed a method of reward shaping - -Reinforcement learning framework based on phased goal, which will accelerate learning convergence speed with less expert examples and explore better especially for tasks where environment rewards are particularly sparse. The framework consists of reward based on phased goal and policy learning using PPO2. The process of acquiring designed reward is divided into stage classification and calculation of goal proximity. Experiments proved that our method can effectively alleviate the problem of sparse reward and obtain higher scores in Atari game than basic algorithm.
Author supplied keywords
Cite
CITATION STYLE
Liu, Y., & Hu, Z. (2020). The Guiding Role of Reward Based on Phased Goal in Reinforcement Learning. In ACM International Conference Proceeding Series (pp. 535–541). Association for Computing Machinery. https://doi.org/10.1145/3383972.3384039
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.