Finding best k policies

Peng Dai; Judy Goldsmith

Conference Proceedings

Finding best k policies

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2009) 5783 LNAI 144-155

DOI: 10.1007/978-3-642-04428-1_13

2Citations

4Readers

Get full text

Abstract

An optimal probabilistic-planning algorithm solves a problem, usually modeled by a Markov decision process, by finding its optimal policy. In this paper, we study the k best policies problem. The problem is to find the k best policies. The k best policies, k > 1, cannot be found directly using dynamic programming. Naïvely, finding the k-th best policy can be Turing reduced to the optimal planning problem, but the number of problems queried in the naïve algorithm is exponential in k. We show empirically that solving k best policy problem by using this reduction requires unreasonable amounts of time even when k = 3. We then provide a new algorithm, based on our theoretical contribution to prove that the k-th best policy differs from the i-th policy, for some i < k, on exactly one state. We show that the time complexity of the algorithm is quadratic in k, but the number of optimal planning problems it solves is linear in k. We demonstrate empirically that the new algorithm has good scalability. © 2009 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Dai, P., & Goldsmith, J. (2009). Finding best k policies. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5783 LNAI, pp. 144–155). https://doi.org/10.1007/978-3-642-04428-1_13

Finding best k policies

Abstract

Cite

Register to see more suggestions