Accurate Q-learning

3Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In order to solve the problem that Q-learning can suffer from large overestimations in some stochastic environments, we first propose a new form of Q-learning, which proves that it is equivalent to the incremental form and analyze the reasons why the convergence rate of Q-learning will be affected by positive bias. We generalize the new form for the purpose of easy adaptations. By using the current value instead of the bias term, we present an accurate Q-learning algorithm and show that the new algorithm converges to an optimal policy. Experimentally, the new algorithm can avoid the effect of positive bias and the convergence rate is faster than Q-learning and its variants on several MDP problems.

Cite

CITATION STYLE

APA

Hu, Z., Jiang, Y., Ling, X., & Liu, Q. (2018). Accurate Q-learning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11303 LNCS, pp. 560–570). Springer Verlag. https://doi.org/10.1007/978-3-030-04182-3_49

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free