Fitted natural actor-critic: A new algorithm for continuous state-action MDPs

19Citations
Citations of this article
27Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

In this paper we address reinforcement learning problems with continuous state-action spaces. We propose a new algorithm, fitted natural actor-critic (FNAC), that extends the work in [1] to allow for general function approximation and data reuse. We combine the natural actor-critic architecture [1] with a variant of fitted value iteration using importance sampling. The method thus obtained combines the appealing features of both approaches while overcoming their main weaknesses: the use of a gradient-based actor readily overcomes the difficulties found in regression methods with policy optimization in continuous action-spaces; in turn, the use of a regression-based critic allows for efficient use of data and avoids convergence problems that TD-based critics often exhibit. We establish the convergence of our algorithm and illustrate its application in a simple continuous space, continuous action problem. © 2008 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Melo, F. S., & Lopes, M. (2008). Fitted natural actor-critic: A new algorithm for continuous state-action MDPs. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5212 LNAI, pp. 66–81). https://doi.org/10.1007/978-3-540-87481-2_5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free