Exponential moving averaged q-network for DDPG

2Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The characteristics of instability and high-variance of Qnetwork result in the overestimation bias in the Deep Q-network (DQN). In contrast, Double DQN and averaged DQN have mitigated this problem from different approaches. By improving on the Averaged-DQN, we theoretically prove that the effective variance reduction of the Exponential Moving Average (EMA) in DQN, further illustrating its efficiency in the target network of Deep Deterministic Policy Gradient. We further propose the A3QDDPG algorithm by introducing the EMA Q-Network which is independent of the target Q-network when updating the policy. Experiments on ten continuous control environments of MuJoCo show that A3QDDPG achieves better performance than DDPG in terms of the average return, and the overestimation phenomenon of DDPG can also be observed under some environment in terms of average Q value.

Cite

CITATION STYLE

APA

Shen, X., Yin, C., Chai, Y., & Hou, X. (2019). Exponential moving averaged q-network for DDPG. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11857 LNCS, pp. 562–572). Springer. https://doi.org/10.1007/978-3-030-31654-9_48

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free