Accurate Q-learning

Zhihui Hu; Yubin Jiang; Xinghong Ling; Quan Liu

Conference Proceedings

Accurate Q-learning

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 11303 LNCS 560-570

DOI: 10.1007/978-3-030-04182-3_49

3Citations

2Readers

Get full text

Abstract

In order to solve the problem that Q-learning can suffer from large overestimations in some stochastic environments, we first propose a new form of Q-learning, which proves that it is equivalent to the incremental form and analyze the reasons why the convergence rate of Q-learning will be affected by positive bias. We generalize the new form for the purpose of easy adaptations. By using the current value instead of the bias term, we present an accurate Q-learning algorithm and show that the new algorithm converges to an optimal policy. Experimentally, the new algorithm can avoid the effect of positive bias and the convergence rate is faster than Q-learning and its variants on several MDP problems.

Author supplied keywords

Cite

CITATION STYLE

APA

Hu, Z., Jiang, Y., Ling, X., & Liu, Q. (2018). Accurate Q-learning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11303 LNCS, pp. 560–570). Springer Verlag. https://doi.org/10.1007/978-3-030-04182-3_49

Accurate Q-learning

Abstract

Author supplied keywords

Cite

Register to see more suggestions