Monte Carlo Bias Correction in Q-Learning

Dimitris Papadimitriou

Conference Proceedings

Monte Carlo Bias Correction in Q-Learning

Papadimitriou D

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2023) 13539 LNAI 343-352

DOI: 10.1007/978-3-031-19907-3_33

0Citations

1Readers

Get full text

Abstract

The Q-learning algorithm suffers from overestimation bias due to the maximum operator appearing in its update rule. Other popular variants of Q-learning, like double Q-learning, can on the other hand cause underestimation of the action values. In many stochastic environments both underestimation and overestimation can lead to sub-optimal strategies. In this paper, we present a variation of Q-learning that uses elements from Monte-Carlo Reinforcement Learning to correct for the overestimation bias. Our method 1) makes no assumptions on the distributions of the action values or the rewards, 2) does not require extensive hyperparameter tuning unlike other popular variants proposed to deal with the overestimation bias and 3) requires storing only two estimators, similar to double Q-learning, along with the most recent episode. Our method is shown to effectively control for the overestimation bias in a number of simulated stochastic environments leading to better policies with higher cumulative rewards and action values that are closer to the optimal ones, as compared to a number of well-established approaches.

Author supplied keywords

Cite

CITATION STYLE

APA

Papadimitriou, D. (2023). Monte Carlo Bias Correction in Q-Learning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13539 LNAI, pp. 343–352). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-19907-3_33

Monte Carlo Bias Correction in Q-Learning

Abstract

Author supplied keywords

Cite

Register to see more suggestions