The Commodity Market (CM) economic model offers a promising approach for the distributed resource allocation in large-scale distributed systems. Existing CM-based mechanisms apply the Economic Equilibrium concepts, assuming price-taking entities that will not engage in strategic behaviour. In this paper we address the above issue and investigate the dynamics of strategic learning agents in a specific type of CM-based mechanism called Iterative Price Adjustment. We investigate the scenario where agents use utility functions to describe preferences in the allocation and learn demand functions adapted to the market by Reinforcement Learning. The reward functions used during the learning process are based either on the individual utility of the agents, generating selfish learning agents, or the social welfare of the market, generating altruistic learning agents. Our experiments show that the market composed exclusively of selfish learning agents achieve results similar to the results obtained by the market composed of altruistic agents. Such an outcome is significant for a series of other domains where individual and social utility should be maximized but agents are not guaranteed to act cooperatively in order to achieve it or they do not want to reveal private preferences. We further investigate this outcome and present an analysis of the agents' behaviour from the perspective of the dynamic process generated by the learning algorithm employed by them. For this, we develop a theoretical model of Multiagent Q-learning with ε-greedy exploration and apply it in simplified version of the addressed scenario. © 2010 Springer-Verlag Berlin Heidelberg.
CITATION STYLE
Gomes, E. R., & Kowalczyk, R. (2010). The dynamics of multiagent Q-learning in commodity market resource allocation. Studies in Computational Intelligence, 263, 315–349. https://doi.org/10.1007/978-3-642-05179-1_15
Mendeley helps you to discover research relevant for your work.