In this chapter, online adaptive reinforcement learning-based solutions are developed for infinite-horizon optimal control problems for continuous-time uncertain nonlinear systems. An actor-critic-identifier structure is developed to approximate the solution to the Hamilton–Jacobi–Bellman equation using three neural network structures. The actor and the critic neural networks approximate the optimal control and the optimal value function, respectively, and a robust dynamic neural network identifier asymptotically approximates the uncertain system dynamics. An advantage of the using the actor-critic-identifier architecture is that learning by the actor, critic, and identifier is continuous and concurrent, without requiring knowledge of system drift dynamics. Convergence of the algorithm is analyzed using Lyapunov-based adaptive control methods. A persistence of excitation condition is required to guarantee exponential convergence to a bounded region in the neighborhood of the optimal control and uniformly ultimately bounded stability of the closed-loop system. The developed actor-critic method is extended to solve trajectory tracking problems under the assumption that the system dynamics are completely known. The actor-critic-identifier architecture is also extended to generate approximate feedback-Nash equilibrium solutions to N-player nonzero-sum differential games. Simulation results are provided to demonstrate the performance of the developed actor-critic-identifier method.
CITATION STYLE
Kamalapurkar, R., Walters, P., Rosenfeld, J., & Dixon, W. (2018). Excitation-based online approximate optimal control. In Communications and Control Engineering (pp. 43–98). Springer International Publishing. https://doi.org/10.1007/978-3-319-78384-0_3
Mendeley helps you to discover research relevant for your work.