Algorithms of reinforcement learning usually employ consecutive agent's actions to construct gradients estimators to adjust agent's policy. The policy is then the result of some kind of stochastic approximation. Because of slowness of stochastic approximation, such algorithms are usually much too slow to be employed, e.g. in real-time adaptive control. In this paper we analyze replacing the stochastic approximation with the estimation based on the entire available history of an agent-environment interaction. We design an algorithm of reinforcement learning in continuous space/action domain that is of orders of magnitude faster then the classical methods.
CITATION STYLE
Wawrzynski, P., & Pacut, A. (2004). Intensive versus non-intensive actor-critic reinforcement learning algorithms. In Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (Vol. 3070, pp. 934–941). Springer Verlag. https://doi.org/10.1007/978-3-540-24844-6_145
Mendeley helps you to discover research relevant for your work.