In this article, we apply a policy gradient-based reinforcement learning to allowing multiple agents to perform cooperative actions in a partially observable environment. We introduce an auxiliary state variable, an internal state, whose stochastic process is Markov, for extracting important features of multi-agent's dynamics. Computer simulations show that every agent can identify an appropriate internal state model and acquire a good policy; this approach is shown to be more effective than a traditional memory-based method. © Springer-Verlag Berlin Heidelberg 2007.
CITATION STYLE
Taniguchi, Y., Mori, T., & Ishii, S. (2007). Reinforcement learning for cooperative actions in a partially observable multi-agent system. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4668 LNCS, pp. 229–238). Springer Verlag. https://doi.org/10.1007/978-3-540-74690-4_24
Mendeley helps you to discover research relevant for your work.