The trade-off issue between exploitation and exploration in multi-agent systems learning have been a crucial area of research for the past few decades. A proper learning policy is necessary to address the issue for the agents to react rapidly and adapt in a dynamic environment. A family of core learning policies were identified in the open literature that are suitable for non-stationary multi-agent foraging task modeled in this paper. The model is used to compare and contrast between the identified learning policies namely greedy, ε-greedy and Boltzmann distribution. A simple random search is also included to justify the convergence of q-learning. A number of simulation-based experiments was conducted and based on the numerical results that was obtained, the performances of the learning policies are discussed. © 2010 Springer-Verlag.
CITATION STYLE
Yogeswaran, M., & Ponnambalam, S. G. (2010). Q-learning policies for multi-agent foraging task. In Communications in Computer and Information Science (Vol. 103 CCIS, pp. 194–201). https://doi.org/10.1007/978-3-642-15810-0_25
Mendeley helps you to discover research relevant for your work.