In this paper, a novel Q-learning based policy iteration adaptive dynamic programming (ADP) algorithm is developed to solve the optimal control problems for discrete-time nonlinear systems. The idea is to use a policy iteration ADP technique to construct the iterative control law which stabilizes the system and simultaneously minimizes the iterative Q function. Convergence property is analyzed to show that the iterative Q function is monotonically non-increasing and converges to the solution of the optimality equation. Finally, simulation results are presented to show the performance of the developed algorithm.
CITATION STYLE
Wei, Q., & Liu, D. (2015). A new discrete-time iterative adaptive dynamic programming algorithm based on Q-learning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9377 LNCS, pp. 43–52). Springer Verlag. https://doi.org/10.1007/978-3-319-25393-0_6
Mendeley helps you to discover research relevant for your work.