Q-learning is a popular reinforcement learning algorithm, but it can perform poorly in stochastic environments due to overestimating action values. Overestimation is due to the use of a single estimator that uses the maximum action value as an approximation for the maximum expected action value. To avoid overestimation in Q-learning, the double Q-learning algorithm was recently proposed, which uses the double estimator method. It uses two estimators from independent sets of experiences, with one estimator determining the maximizing action and the other providing the estimate of its value. Double Q-learning sometimes underestimates the action values. This paper introduces a weighted double Q-learning algorithm, which is based on the construction of the weighted double estimator, with the goal of balancing between the overestimation in the single estimator and the underestimation in the double estimator. Empirically, the new algorithm is shown to perform well on several MDP problems.
CITATION STYLE
Zhang, Z., Pan, Z., & Kochenderfer, M. J. (2017). Weighted double Q-learning. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 0, pp. 3455–3461). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2017/483
Mendeley helps you to discover research relevant for your work.