A general convergence method for reinforcement learning in the continuous case

2Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

we propose a general method for designing convergent Reinforcement Learning algorithms in the case of continuous state-space and time variables. The method is based on the discretiza-tion of the continuous process by convergent approximation schemes: The Hamilton-Jacobi-Bellman equation is replaced by a Dynamic Programming (DP) equation for some Markovian Decision Process (MDP). If the data of the MDP were known, we could compute the value of the DP equation by using some DP updating rules. However, in the Reinforcement Learning (RL) approach, the state dynamics as well as the reinforcement functions are a priori unknown, leading impossible to use DP rules. Here we prove a general convergence theorem which states that if the values updated by some RL algorithm are close enough (in the sense that they satisfy a "weak" contraction property) to those of the DP, then they converge to the value function of the continuous process. The method is very general and is illustrated with a model-based algorithm built from a finite-difference approximation scheme.

Cite

CITATION STYLE

APA

Munos, R. (1998). A general convergence method for reinforcement learning in the continuous case. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1398, pp. 394–405). Springer Verlag. https://doi.org/10.1007/bfb0026710

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free