Intensive versus non-intensive actor-critic reinforcement learning algorithms

1Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Algorithms of reinforcement learning usually employ consecutive agent's actions to construct gradients estimators to adjust agent's policy. The policy is then the result of some kind of stochastic approximation. Because of slowness of stochastic approximation, such algorithms are usually much too slow to be employed, e.g. in real-time adaptive control. In this paper we analyze replacing the stochastic approximation with the estimation based on the entire available history of an agent-environment interaction. We design an algorithm of reinforcement learning in continuous space/action domain that is of orders of magnitude faster then the classical methods.

Cite

CITATION STYLE

APA

Wawrzynski, P., & Pacut, A. (2004). Intensive versus non-intensive actor-critic reinforcement learning algorithms. In Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (Vol. 3070, pp. 934–941). Springer Verlag. https://doi.org/10.1007/978-3-540-24844-6_145

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free