This work develops a reinforcement learning method for multi-agent negotiation. While existing works have developed various learning methods for multi-agent negotiation, they have primarily focus on the Temporal-Difference (TD) algorithm (action-value methods) in general and overlooked the unique properties of parameterized policy. As such, these methods can be suboptimal for multi-agent negotiation. In this paper, we study the problem of multi-agent negotiation in real-time bidding scenario. We propose a new method named EQL, short for Extended Q-learning, which iteratively assigns the state transition probability and finally converges to a unique optimum effectively. By performing linear approximation of the off-policy critic purposefully, we integrate Expected Policy Gradients (EPG) into basic Q-learning. Importantly, we then propose a novel negotiation framework by accounting for both the EQL and edge computing between mobile devices and cloud servers to handle the data preprocessing and transmission simultaneously to reduce the load of cloud servers. We conduct extensive experiments on two real datasets. Both quantitative results and qualitative analysis verify the effectiveness and rationality of our EQL method.
CITATION STYLE
Kong, C., Chen, B., Li, S., Chen, J., Chen, Y., & Zhang, L. (2020). An advanced q-learning model for multi-agent negotiation in real-time bidding. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12432 LNCS, pp. 491–502). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-60029-7_44
Mendeley helps you to discover research relevant for your work.