Optimistic Value Iteration

Arnd Hartmanns; Benjamin Lucien Kaminski

Conference ProceedingsOPEN ACCESS

Optimistic Value Iteration

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2020) 12225 LNCS 488-511

DOI: 10.1007/978-3-030-53291-8_26

22Citations

4Readers

Abstract

Markov decision processes are widely used for planning and verification in settings that combine controllable or adversarial choices with probabilistic behaviour. The standard analysis algorithm, value iteration, only provides lower bounds on infinite-horizon probabilities and rewards. Two “sound” variations, which also deliver an upper bound, have recently appeared. In this paper, we present a new sound approach that leverages value iteration’s ability to usually deliver good lower bounds: we obtain a lower bound via standard value iteration, use the result to “guess” an upper bound, and prove the latter’s correctness. We present this optimistic value iteration approach for computing reachability probabilities as well as expected rewards. It is easy to implement and performs well, as we show via an extensive experimental evaluation using our implementation within the mcsta model checker of the Modest Toolset.

Cite

CITATION STYLE

APA

Hartmanns, A., & Kaminski, B. L. (2020). Optimistic Value Iteration. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12225 LNCS, pp. 488–511). Springer. https://doi.org/10.1007/978-3-030-53291-8_26

Optimistic Value Iteration

Abstract

Cite

Register to see more suggestions