Reinforcement Learning for Optimal Feedback Control

Rushikesh Kamalapurkar; Patrick Walters; Joel Rosenfeld; Warren Dixon

Book

Reinforcement Learning for Optimal Feedback Control

Kamalapurkar R
Walters P
Rosenfeld J
et al.

DOI: 10.1007/978-3-319-78384-0

N/ACitations

99Readers

Get full text

Abstract

Making the best possible decision according to some desired set of criteria is always dif fi cult. Such decisions are even more dif fi cult when there are time constraints and can be impossible when there is uncertainty in the system model. Yet, the ability to make such decisions can enable higher levels of autonomy in robotic systems and, as a result, have dramatic impacts on society. Given this motivation, various mathematical theories have been developed related to concepts such as optimality, feedback control, and adaptation/learning. This book describes how such theories can be used to develop optimal (i.e., the best possible) controllers/policies (i.e., the decision) for a particular class of problems. Speci fi cally, this book is focused on the development of concurrent, real-time learning and execution of approximate opti- mal policies for in fi nite-horizon optimal control problems for continuous-time deterministic uncertain nonlinear systems. The developed approximate optimal controllers are based on reinforcement learning-based solutions, where learning occurs through an actor – critic-based reward system. Detailed attention to control-theoretic concerns such as convergence and stability differentiates this book from the large body of existing literature on reinforcement learning. Moreover, both model-free and model-based methods are developed. The model-based methods are motivated by the idea that a system can be controlled better as more knowledge is available about the system. To account for the uncertainty in the model, typical actor – critic reinforcement learning is augmented with unique model identi fi cation methods. The optimal policies in this book are derived from dynamic programming methods; hence, they suffer from the curse of dimensionality. To address the computational demands of such an approach, a unique function approximation strategy is provided to signi fi cantly reduce the number of required kernels along with parallel learning through novel state extrapolation strategies. The material is intended for readers that have a basic understanding of nonlinear analysis tools such as Lyapunov-based methods. The development and results may help to support educators, practitioners, and researchers with nonlinear systems/control, optimal control, and intelligent/adaptive control interests working in aerospace engineering, computer science, electrical engineering, industrial vii uncertain nonlinear systems and to generate approximate feedback-Nash equilib- rium solutions to N -player nonzero-sum differential games. Chapter 5 discusses the formulation and online approximate feedback-Nash equilibrium solution for an optimal formation tracking problem. A relative control error minimization technique is introduced to facilitate the formulation of a feasible in fi nite-horizon total-cost differential graphical game. A dynamic programming- based feedback-Nash equilibrium solution to the differential graphical game is obtained via the development of a set of coupled Hamilton – Jacobi equations. The developed approximate feedback-Nash equilibrium solution is analyzed using a Lyapunov-based stability analysis to yield formation tracking in the presence of uncertainties. In addition to control, this chapter also explores applications of dif- ferential graphical games to monitoring the behavior of neighboring agents in a network. Chapter 6 focuses on applications of model-based reinforcement learning to closed-loop control of autonomous vehicles. The fi rst part of the chapter is devoted to online approximation of the optimal station keeping strategy for a fully actuated marine craft. The developed strategy is experimentally validated using an autono- mous underwater vehicle, where the three degrees of freedom in the horizontal plane are regulated. The second part of the chapter is devoted to online approxi- mation of an in fi nite-horizon optimal path-following strategy for a unicycle-type mobile robot. An approximate optimal guidance law is obtained through the application of model-based reinforcement learning and concurrent learning-based parameter estimation. Simulation results demonstrate that the developed method learns an optimal controller which is approximately the same as an optimal con- troller determined by an off-line numerical solver, and experimental results demonstrate the ability of the controller to learn the approximate solution in real time. Motivated by computational issues arising in approximate dynamic program- ming, a function approximation method is developed in Chap. 7 that aims to approximate a function in a small neighborhood of a state that travels within a compact set. The development is based on the theory of universal reproducing kernel Hilbert spaces over the n -dimensional Euclidean space. Several theorems are introduced that support the development of this State Following (StaF) method. In particular, it is shown that there is a bound on the number of kernel functions required for the maintenance of an accurate function approximation as a state moves through a compact set. Additionally, a weight update law, based on gradient des- cent, is introduced where good accuracy can be achieved provided the weight update law is iterated at a high enough frequency. Simulation results are presented that demonstrate the utility of the StaF methodology for the maintenance of accurate function approximation as well as solving the in fi nite-horizon optimal regulation problem. The results of the simulation indicate that fewer basis functions are required to guarantee stability and approximate optimality than are required when a global approximation approach is used. Preface ix The authors would like to express their sincere appreciation to a number of individuals whose support made the book possible. Numerous intellectual discus- sions and research support were provided by all of our friends and colleagues in the Nonlinear Controls and Robotics Laboratory at the University of Florida, with particular thanks to Shubhendu Bhasin, Patryk Deptula, Huyen Dinh, Keith Dupree, Nic Fischer, Marcus Johnson, Justin Klotz, and Anup Parikh. Inspiration and insights for our work were provided, in part, through discussions with and/or reading foundational literature by Bill Hager, Michael Jury, Paul Robinson, Frank Lewis (the academic grandfather or great grandfather to several of the authors), Derong Liu, Anil Rao, Kyriakos Vamvoudakis, Richard Vinter, Daniel Liberzon, and Draguna Vrabie. The research strategies and breakthroughs described in this book would also not have been possible without funding support provided from research sponsors including: NSF award numbers 0901491 and 1509516, Of fi ce of Naval Research Grants N00014-13-1-0151 and N00014-16-1-2091, Prioria Robotics, and the Air Force Research Laboratory, Eglin AFB. Most importantly, we are eternally thankful for our families who are unwavering in their love, support, and understanding.

Cite

CITATION STYLE

APA

Kamalapurkar, R., Walters, P., Rosenfeld, J., & Dixon, W. (2018). Reinforcement Learning for Optimal Feedback Control. https://doi.org/10.1007/978-3-319-78384-0

Reinforcement Learning for Optimal Feedback Control

Abstract

Cite

Register to see more suggestions