Heuristic search for generalized stochastic shortest path MDPs

72Citations
Citations of this article
48Readers
Mendeley users who have this article in their library.

Abstract

Research in efficient methods for solving infinite-horizon MDPs has so far concentrated primarily on discounted MDPs and the more general stochastic shortest path problems (SSPs). These are MDPs with 1) an optimal value function V*that is the unique solution of Bellman equation and 2) optimal policies that are the greedy policies w.r.t. V*. This paper's main contribution is the description of a new class of MDPs, that have well-defined optimal solutions that do not comply with either 1 or 2 above. We call our new class Generalized Stochastic Shortest Path (GSSP) problems. GSSP allows more general reward structure than SSP and subsumes several established MDP types including SSP, positive-bounded, negative, and discounted-reward models. While existing efficient heuristic search algorithms like LAO*and LRTDP are not guaranteed to converge to the optimal value function for GSSPs, we present a new heuristic-search-based family of algorithms, FRET (Find, Revise, Eliminate Traps). A preliminary empirical evaluation shows that FRET solves GSSPs much more efficiently than Value Iteration. Copyright © 2011, Association for the Advancement of Artificial Intelligence. All rights reserved.

Cite

CITATION STYLE

APA

Kolobov, A., Mausam, Weld, D. S., & Geffner, H. (2011). Heuristic search for generalized stochastic shortest path MDPs. In ICAPS 2011 - Proceedings of the 21st International Conference on Automated Planning and Scheduling (pp. 130–137). https://doi.org/10.1609/icaps.v21i1.13452

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free