Abstract
We provide some general results on the convergence of a class of stochastic approximation algorithms and their parallel and asynchronous variants. We then use these results to study the Q-learning algorithm, a reinforcement learning method for solving Markov decision problems, and establish its convergence under conditions more general than previously available.
Cite
CITATION STYLE
APA
Tsitsiklis, J. N. (1993). Asynchronous stochastic approximation and Q-learning. In Proceedings of the IEEE Conference on Decision and Control (Vol. 1, pp. 395–400). IEEE. https://doi.org/10.1007/bf00993306
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.
Already have an account? Sign in
Sign up for free