Prioritised experience replay based on sample optimisation

  • Wang X
  • Xiang H
  • Cheng Y
  • et al.
N/ACitations
Citations of this article
15Readers
Mendeley users who have this article in their library.

Abstract

The sample‐based prioritised experience replay proposed in this study is aimed at how to select samples to the experience replay, which improves the training speed and increases the reward return. In the traditional deep Q‐networks (DQNs), it is subjected to random pickup of samples into the experience replay. However, the effect of each sample is different for the training process of agent. A better sampling method will make the agent training more effective. Therefore, when selecting a sample to the experience replay, the authors first allow the agent to learn randomly through the sample optimisation network, and take the average value returned after each study, so that the mean value is used as a threshold for selecting samples to the experience replay. Second, on the basis of sample optimisation, the authors increase the priority update and use the idea of reward‐shaping to give additional reward values to the returns of certain samples, which speeds up the agent training. Compared with traditional DQN and the prioritised experience replay DQN, this study uses OpenAI Gym as platform to improve agent learning efficiency.

Cite

CITATION STYLE

APA

Wang, X., Xiang, H., Cheng, Y., & Yu, Q. (2020). Prioritised experience replay based on sample optimisation. The Journal of Engineering, 2020(13), 298–302. https://doi.org/10.1049/joe.2019.1204

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free