Ensemble Network Architecture for Deep Reinforcement Learning

Citations of this article
Mendeley users who have this article in their library.


The popular deep Q learning algorithm is known to be instability because of the Q-value's shake and overestimation action values under certain conditions. These issues tend to adversely affect their performance. In this paper, we develop the ensemble network architecture for deep reinforcement learning which is based on value function approximation. The temporal ensemble stabilizes the training process by reducing the variance of target approximation error and the ensemble of target values reduces the overestimate and makes better performance by estimating more accurate Q-value. Our results show that this architecture leads to statistically significant better value evaluation and more stable and better performance on several classical control tasks at OpenAI Gym environment.




Chen, X. L., Cao, L., Li, C. X., Xu, Z. X., & Lai, J. (2018). Ensemble Network Architecture for Deep Reinforcement Learning. Mathematical Problems in Engineering, 2018. https://doi.org/10.1155/2018/2129393

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free