Abstract
Although several variants of stochastic dynamic programming have been applied to optimal operation of multireservoir systems, they have been plagued by a high-dimensional state space and the inability to accurately incorporate the stochastic environment as characterized by temporally and spatially correlated hydrologie inflows. Reinforcement learning has emerged as an effective approach to solving sequential decision problems by combining concepts from artificial intelligence, cognitive science, and operations research. A reinforcement learning system has a mathematical foundation similar to dynamic programming and Markov decision processes, with the goal of maximizing the long-term reward or returns as conditioned on the state of the system environment and the immediate reward obtained from operational decisions. Reinforcement learning can include Monte Carlo simulation where transition probabilities and rewards are not explicitly known a priori. The Q-Learning method in reinforcement learning is demonstrated on the two-reservoir Geum River system, South Korea, and is shown to outperform implicit stochastic dynamic programming and sampling stochastic dynamic programming methods. Copyright 2007 by the American Geophysical Union.
Cite
CITATION STYLE
Lee, J. H., & Labadie, J. W. (2007). Stochastic optimization of multireservoir systems via reinforcement learning. Water Resources Research, 43(11). https://doi.org/10.1029/2006WR005627
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.