A Dynamic Adjusting Reward Function Method for Deep Reinforcement Learning with Adjustable Parameters

33Citations
Citations of this article
58Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

In deep reinforcement learning, network convergence speed is often slow and easily converges to local optimal solutions. For an environment with reward saltation, we propose a magnify saltatory reward (MSR) algorithm with variable parameters from the perspective of sample usage. MSR dynamically adjusts the rewards for experience with reward saltation in the experience pool, thereby increasing an agent's utilization of these experiences. We conducted experiments in a simulated obstacle avoidance search environment of an unmanned aerial vehicle and compared the experimental results of deep Q-network (DQN), double DQN, and dueling DQN after adding MSR. The experimental results demonstrate that, after adding MSR, the algorithms exhibit a faster network convergence and can obtain the global optimal solution easily.

Cite

CITATION STYLE

APA

Hu, Z., Wan, K., Gao, X., & Zhai, Y. (2019). A Dynamic Adjusting Reward Function Method for Deep Reinforcement Learning with Adjustable Parameters. Mathematical Problems in Engineering, 2019. https://doi.org/10.1155/2019/7619483

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free