Faster near-optimal reinforcement learning: Adding adaptiveness to the e3 algorithm

Carlos Domingo

Conference Proceedings

Faster near-optimal reinforcement learning: Adding adaptiveness to the e3 algorithm

Domingo C

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (1999) 1720 241-251

DOI: 10.1007/3-540-46769-6_20

6Citations

19Readers

Get full text

Abstract

Recently, Kearns and Singh presented the first provably efficient and near-optimal algorithm for reinforcement learning in general Markov decision processes. One of the key contributions of the algorithm is its explicit treatment of the exploration-exploitation trade off. In this paper, we show how the algorithm can be improved by substituting the exploration phase, that builds a model of the underlying Markov decision process by estimating the transition probabilities, by an adaptive sampling method more suitable for the problem. Our improvement is two-folded. First, our theoretical bound on the worst case time needed to converge to an almost optimal policy is significatively smaller. Second, due to the adaptiveness of the sampling method we use, we discuss how our algorithm might perform better in practice than the previous one.

Cite

CITATION STYLE

APA

Domingo, C. (1999). Faster near-optimal reinforcement learning: Adding adaptiveness to the e3 algorithm. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1720, pp. 241–251). Springer Verlag. https://doi.org/10.1007/3-540-46769-6_20

Faster near-optimal reinforcement learning: Adding adaptiveness to the e3 algorithm

Abstract

Cite

Register to see more suggestions