Weighted double Q-learning

Zongzhang Zhang; Zhiyuan Pan; Mykel J. Kochenderfer

Conference ProceedingsOPEN ACCESS

Weighted double Q-learning

IJCAI International Joint Conference on Artificial Intelligence (2017) 0 3455-3461

DOI: 10.24963/ijcai.2017/483

86Citations

50Readers

Abstract

Q-learning is a popular reinforcement learning algorithm, but it can perform poorly in stochastic environments due to overestimating action values. Overestimation is due to the use of a single estimator that uses the maximum action value as an approximation for the maximum expected action value. To avoid overestimation in Q-learning, the double Q-learning algorithm was recently proposed, which uses the double estimator method. It uses two estimators from independent sets of experiences, with one estimator determining the maximizing action and the other providing the estimate of its value. Double Q-learning sometimes underestimates the action values. This paper introduces a weighted double Q-learning algorithm, which is based on the construction of the weighted double estimator, with the goal of balancing between the overestimation in the single estimator and the underestimation in the double estimator. Empirically, the new algorithm is shown to perform well on several MDP problems.

Cite

CITATION STYLE

APA

Zhang, Z., Pan, Z., & Kochenderfer, M. J. (2017). Weighted double Q-learning. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 0, pp. 3455–3461). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2017/483

Weighted double Q-learning

Abstract

Cite

Register to see more suggestions