We provide some general results on the convergence of a class of stochastic approximation algorithms and their parallel and asynchronous variants. We then use these results to study the Q-learning algorithm, a reinforcement learning method for solving Markov decision problems, and establish its convergence under conditions more general than previously available.
CITATION STYLE
Tsitsiklis, J. N. (1993). Asynchronous stochastic approximation and Q-learning. In Proceedings of the IEEE Conference on Decision and Control (Vol. 1, pp. 395–400). IEEE. https://doi.org/10.1007/bf00993306
Mendeley helps you to discover research relevant for your work.