Intensive versus non-intensive actor-critic reinforcement learning algorithms

Pawel Wawrzynski; Andrzej Pacut

Conference Proceedings

Intensive versus non-intensive actor-critic reinforcement learning algorithms

Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (2004) 3070 934-941

DOI: 10.1007/978-3-540-24844-6_145

1Citations

2Readers

Get full text

Abstract

Algorithms of reinforcement learning usually employ consecutive agent's actions to construct gradients estimators to adjust agent's policy. The policy is then the result of some kind of stochastic approximation. Because of slowness of stochastic approximation, such algorithms are usually much too slow to be employed, e.g. in real-time adaptive control. In this paper we analyze replacing the stochastic approximation with the estimation based on the entire available history of an agent-environment interaction. We design an algorithm of reinforcement learning in continuous space/action domain that is of orders of magnitude faster then the classical methods.

Author supplied keywords

Cite

CITATION STYLE

APA

Wawrzynski, P., & Pacut, A. (2004). Intensive versus non-intensive actor-critic reinforcement learning algorithms. In Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (Vol. 3070, pp. 934–941). Springer Verlag. https://doi.org/10.1007/978-3-540-24844-6_145

Intensive versus non-intensive actor-critic reinforcement learning algorithms

Abstract

Author supplied keywords

Cite

Register to see more suggestions