Adaptive strategies and regret minimization in arbitrarily varying Markov environments

Shie Mannor; Nahum Shimkin

Conference Proceedings

Adaptive strategies and regret minimization in arbitrarily varying Markov environments

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2001) 2111 128-142

DOI: 10.1007/3-540-44581-1_9

1Citations

4Readers

Get full text

Abstract

We consider the problem of maximizing the average reward in a controlled Markov environment, which also contains some arbitrarily varying elements. This problem is captured by a two-person stochastic game model involving the reward maximizing agent and a second player, which is free to use an arbitrary (non-stationary and unpredictable) control strategy. While the minimax value of the associated zero-sum game provides a guaranteed performance level, the fact that the second player’s behavior is observed as the game unfolds opens up the opportunity to improve upon this minimax value if the second player is not playing a worst-case strategy. This basic idea has been formalized in the context of repeated matrix games by the classical notions of regret minimization with respect to the Bayes envelope, where an attainable performance goal is defined in terms of the empirical frequencies of the opponent’s actions. This paper presents an extension of these ideas to problems with Markovian dynamics, under appropriate recurrence conditions. The Bayes envelope is first defined in a natural way in terms of the observed state action frequencies. As this envelope may not be attained in general, we define a proper convexification thereof as an attainable solution concept. In the specific case of single-controller games, where the opponent alone controls the state transitions, the Bayes envelope itself turns out to be convex and attainable. Some concrete examples are shown to fit in this framework.

Cite

CITATION STYLE

APA

Mannor, S., & Shimkin, N. (2001). Adaptive strategies and regret minimization in arbitrarily varying Markov environments. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2111, pp. 128–142). Springer Verlag. https://doi.org/10.1007/3-540-44581-1_9

Adaptive strategies and regret minimization in arbitrarily varying Markov environments

Abstract

Cite

Register to see more suggestions