Model and reinforcement learning for Markov games with risk preferences

3Citations
Citations of this article
20Readers
Mendeley users who have this article in their library.

Abstract

We motivate and propose a new model for non-cooperative Markov game which considers the interactions of risk-aware players. This model characterizes the time-consistent dynamic “risk” from both stochastic state transitions (inherent to the game) and randomized mixed strategies (due to all other players). An appropriate risk-aware equilibrium concept is proposed and the existence of such equilibria is demonstrated in stationary strategies by an application of Kakutani's fixed point theorem. We further propose a simulation-based Q-learning type algorithm for risk-aware equilibrium computation. This algorithm works with a special form of minimax risk measures which can naturally be written as saddle-point stochastic optimization problems, and covers many widely investigated risk measures. Finally, the almost sure convergence of this simulation-based algorithm to an equilibrium is demonstrated under some mild conditions. Our numerical experiments on a two player queuing game validate the properties of our model and algorithm, and demonstrate their worth and applicability in real life competitive decision-making.

Cite

CITATION STYLE

APA

Huang, W., Hai, P. V., & Haskell, W. B. (2020). Model and reinforcement learning for Markov games with risk preferences. In AAAI 2020 - 34th AAAI Conference on Artificial Intelligence (pp. 2022–2029). AAAI press. https://doi.org/10.1609/aaai.v34i02.5574

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free