Weighted double Q-learning

86Citations
Citations of this article
50Readers
Mendeley users who have this article in their library.

Abstract

Q-learning is a popular reinforcement learning algorithm, but it can perform poorly in stochastic environments due to overestimating action values. Overestimation is due to the use of a single estimator that uses the maximum action value as an approximation for the maximum expected action value. To avoid overestimation in Q-learning, the double Q-learning algorithm was recently proposed, which uses the double estimator method. It uses two estimators from independent sets of experiences, with one estimator determining the maximizing action and the other providing the estimate of its value. Double Q-learning sometimes underestimates the action values. This paper introduces a weighted double Q-learning algorithm, which is based on the construction of the weighted double estimator, with the goal of balancing between the overestimation in the single estimator and the underestimation in the double estimator. Empirically, the new algorithm is shown to perform well on several MDP problems.

Cite

CITATION STYLE

APA

Zhang, Z., Pan, Z., & Kochenderfer, M. J. (2017). Weighted double Q-learning. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 0, pp. 3455–3461). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2017/483

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free