Faster near-optimal reinforcement learning: Adding adaptiveness to the e3 algorithm

6Citations
Citations of this article
19Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Recently, Kearns and Singh presented the first provably efficient and near-optimal algorithm for reinforcement learning in general Markov decision processes. One of the key contributions of the algorithm is its explicit treatment of the exploration-exploitation trade off. In this paper, we show how the algorithm can be improved by substituting the exploration phase, that builds a model of the underlying Markov decision process by estimating the transition probabilities, by an adaptive sampling method more suitable for the problem. Our improvement is two-folded. First, our theoretical bound on the worst case time needed to converge to an almost optimal policy is significatively smaller. Second, due to the adaptiveness of the sampling method we use, we discuss how our algorithm might perform better in practice than the previous one.

Cite

CITATION STYLE

APA

Domingo, C. (1999). Faster near-optimal reinforcement learning: Adding adaptiveness to the e3 algorithm. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1720, pp. 241–251). Springer Verlag. https://doi.org/10.1007/3-540-46769-6_20

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free