Approximating the termination value of one-counter MDPs and stochastic games

Tomáš Brázdil; Václav Brožek; Kousha Etessami; Antonín Kučera

Conference Proceedings

Approximating the termination value of one-counter MDPs and stochastic games

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2011) 6756 LNCS(PART 2) 332-343

DOI: 10.1007/978-3-642-22012-8_26

14Citations

10Readers

Get full text

Abstract

One-counter MDPs (OC-MDPs) and one-counter simple stochastic games (OC-SSGs) are 1-player, and 2-player turn-based zero-sum, stochastic games played on the transition graph of classic one-counter automata (equivalently, pushdown automata with a 1-letter stack alphabet). A key objective for the analysis and verification of these games is the termination objective, where the players aim to maximize (minimize, respectively) the probability of hitting counter value 0, starting at a given control state and given counter value. Recently [4,2], we studied qualitative decision problems ("is the optimal termination value = 1?") for OC-MDPs (and OC-SSGs) and showed them to be decidable in P-time (in NP∩coNP, respectively). However, quantitative decision and approximation problems ("is the optimal termination value ≥ p", or "approximate the termination value within ε") are far more challenging. This is so in part because optimal strategies may not exist, and because even when they do exist they can have a highly non-trivial structure. It thus remained open even whether any of these quantitative termination problems are computable. In this paper we show that all quantitative approximation problems for the termination value for OC-MDPs and OC-SSGs are computable. Specifically, given a OC-SSG, and given ε > 0, we can compute a value v that approximates the value of the OC-SSG termination game within additive error ε, and furthermore we can compute ε-optimal strategies for both players in the game. A key ingredient in our proofs is a subtle martingale, derived from solving certain LPs that we can associate with a maximizing OC-MDP. An application of Azuma's inequality on these martingales yields a computable bound for the "wealth" at which a "rich person's strategy" becomes ε-optimal for OC-MDPs. © 2011 Springer-Verlag.

Cite

CITATION STYLE

APA

Brázdil, T., Brožek, V., Etessami, K., & Kučera, A. (2011). Approximating the termination value of one-counter MDPs and stochastic games. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6756 LNCS, pp. 332–343). https://doi.org/10.1007/978-3-642-22012-8_26

Approximating the termination value of one-counter MDPs and stochastic games

Abstract

Cite

Register to see more suggestions