Reinforcement learning of optimal controls

John K. Williams

Book Chapter

Reinforcement learning of optimal controls

Williams J

Springer Netherlands, (2009), 297-327

DOI: 10.1007/978-1-4020-9119-3_15

4Citations

10Readers

Get full text

Abstract

As humans, we continually interpret sensory input to try to make sense of the world around us, that is, we develop mappings from observations to a useful estimate of the environmental state. A number of artificial intelligence methods for producing such mappings are described in this book, along with applications showing how they may be used to better understand a physical phenomenon or contribute to a decision support system. However, people don't want simply to understand the world around us. Rather, we interact with it to accomplish certain goals-for instance, to obtain food, water, warmth, shelter, status or wealth. Learning how to accurately estimate the state of our environment is intimately tied to how we then use that knowledge to manipulate it. Our actions change the environmental state and generate positive or negative feedback, which we evaluate and use to inform our future behavior in a continuing cycle of observation, action, environmental change and feedback. In the field of machine learning, this common human experience is abstracted to that of a learning agent whose purpose is to discover through interacting with its environment how to act to achieve its goals. In general, no teacher is available to supply correct actions, nor will feedback always be immediate. Instead, the learner must use the sequence of experiences resulting from its actions to determine which actions to repeat and which to avoid. In doing so, it must be able to assign credit or blame to actions that may be long past, and it must balance the exploitation of knowledge previously gained with the need to explore untried, possibly superior strategies. Reinforcement learning, also called stochastic dynamic programming, is the area of machine learning devoted to solving this general learning problem. Although the term reinforcement learning has traditionally been used in a number of contexts, the modern field is the result of a synthesis in the 1980s of ideas from optimal control theory, animal learning, and temporal difference methods from artificial intelligence. Finding a mapping that prescribes actions based on measured environmental states in a way that optimizes some long-term measure of success is the subject of what mathematicians and engineers call optimal control problems and psychologists call planning problems. There is a deep body of mathematical literature on optimal control theory describing how to analyze a system and develop optimal mappings. However, in many applications the system is poorly understood, complex, difficult to analyze mathematically, or changing in time. In such cases, a machine learning approach that learns a good control strategy from real or simulated experience may be the only practical approach (Si et al. 2004). © 2009 Springer Netherlands.

Cite

CITATION STYLE

APA

Williams, J. K. (2009). Reinforcement learning of optimal controls. In Artificial Intelligence Methods in the Environmental Sciences (pp. 297–327). Springer Netherlands. https://doi.org/10.1007/978-1-4020-9119-3_15

Reinforcement learning of optimal controls

Abstract

Cite

Register to see more suggestions