A general convergence method for reinforcement learning in the continuous case

Remi Munos

Conference ProceedingsOPEN ACCESS

A general convergence method for reinforcement learning in the continuous case

Munos R

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (1998) 1398 394-405

DOI: 10.1007/bfb0026710

2Citations

8Readers

Abstract

we propose a general method for designing convergent Reinforcement Learning algorithms in the case of continuous state-space and time variables. The method is based on the discretiza-tion of the continuous process by convergent approximation schemes: The Hamilton-Jacobi-Bellman equation is replaced by a Dynamic Programming (DP) equation for some Markovian Decision Process (MDP). If the data of the MDP were known, we could compute the value of the DP equation by using some DP updating rules. However, in the Reinforcement Learning (RL) approach, the state dynamics as well as the reinforcement functions are a priori unknown, leading impossible to use DP rules. Here we prove a general convergence theorem which states that if the values updated by some RL algorithm are close enough (in the sense that they satisfy a "weak" contraction property) to those of the DP, then they converge to the value function of the continuous process. The method is very general and is illustrated with a model-based algorithm built from a finite-difference approximation scheme.

Cite

CITATION STYLE

APA

Munos, R. (1998). A general convergence method for reinforcement learning in the continuous case. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1398, pp. 394–405). Springer Verlag. https://doi.org/10.1007/bfb0026710

A general convergence method for reinforcement learning in the continuous case

Abstract

Cite

Register to see more suggestions