In this chapter, generalized policy iteration (GPI) algorithms are developed to solve infinite-horizon optimal control problems for discrete-time nonlinear systems. GPI algorithms use the idea of interacting policy iteration and value iteration algorithms of adaptive dynamic programming (ADP). They permit an arbitrary positive semidefinite function to initialize the algorithm, where two revolving iterations are used for policy evaluation and policy improvement, respectively. Then, the monotonicity, convergence, admissibility, and optimality properties of the present GPI algorithms for discrete-time nonlinear systems are analyzed. For implementation of the GPI algorithms, neural networks are employed for approximating the iterative value functions and computing the iterative control laws, respectively, to obtain the approximate optimal control law. Simulation examples are included to verify the effectiveness of the present algorithm.
CITATION STYLE
Liu, D., Wei, Q., Wang, D., Yang, X., & Li, H. (2017). Generalized policy iteration ADP for discrete-time nonlinear systems. In Advances in Industrial Control (pp. 177–221). Springer International Publishing. https://doi.org/10.1007/978-3-319-50815-3_5
Mendeley helps you to discover research relevant for your work.