PAC optimal planning for invasive species management: Improved exploration for reinforcement learning from simulator-defined MDPs

Thomas G. Dietterich; Majid Alkaee Taleghan; Mark Crowley

Conference ProceedingsOPEN ACCESS

PAC optimal planning for invasive species management: Improved exploration for reinforcement learning from simulator-defined MDPs

Proceedings of the 27th AAAI Conference on Artificial Intelligence, AAAI 2013 (2013) 1270-1276

DOI: 10.1609/aaai.v27i1.8487

10Citations

24Readers

Abstract

Often the most practical way to define a Markov Decision Process (MDP) is as a simulator that, given a state and an action, produces a resulting state and immediate reward sampled from the corresponding distributions. Simulators in natural resource management can be very expensive to execute, so that the time required to solve such MDPs is dominated by the number of calls to the simulator. This paper presents an algorithm, DDV, that combines improved confidence intervals on the Q values (as in interval estimation) with a novel upper bound on the discounted state occupancy probabilities to intelligently choose state-action pairs to explore. We prove that this algorithm terminates with a policy whose value is within e of the optimal policy (with probability 1- δ) after making only polynomially-many calls to the simulator. Experiments on one benchmark MDP and on an MDP for invasive species management show very large reductions in the number of simulator calls required. Copyright © 2013, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

Cite

CITATION STYLE

APA

Dietterich, T. G., Taleghan, M. A., & Crowley, M. (2013). PAC optimal planning for invasive species management: Improved exploration for reinforcement learning from simulator-defined MDPs. In Proceedings of the 27th AAAI Conference on Artificial Intelligence, AAAI 2013 (pp. 1270–1276). https://doi.org/10.1609/aaai.v27i1.8487

PAC optimal planning for invasive species management: Improved exploration for reinforcement learning from simulator-defined MDPs

Abstract

Cite

Register to see more suggestions