Q-learning policies for multi-agent foraging task

M. Yogeswaran; S. G. Ponnambalam

Conference Proceedings

Q-learning policies for multi-agent foraging task

Communications in Computer and Information Science (2010) 103 CCIS 194-201

DOI: 10.1007/978-3-642-15810-0_25

0Citations

1Readers

Get full text

Abstract

The trade-off issue between exploitation and exploration in multi-agent systems learning have been a crucial area of research for the past few decades. A proper learning policy is necessary to address the issue for the agents to react rapidly and adapt in a dynamic environment. A family of core learning policies were identified in the open literature that are suitable for non-stationary multi-agent foraging task modeled in this paper. The model is used to compare and contrast between the identified learning policies namely greedy, ε-greedy and Boltzmann distribution. A simple random search is also included to justify the convergence of q-learning. A number of simulation-based experiments was conducted and based on the numerical results that was obtained, the performances of the learning policies are discussed. © 2010 Springer-Verlag.

Cite

CITATION STYLE

APA

Yogeswaran, M., & Ponnambalam, S. G. (2010). Q-learning policies for multi-agent foraging task. In Communications in Computer and Information Science (Vol. 103 CCIS, pp. 194–201). https://doi.org/10.1007/978-3-642-15810-0_25

Q-learning policies for multi-agent foraging task

Abstract

Cite

Register to see more suggestions