Decentralized Scheduling with QoS Constraints: Achieving O(1) QoS Regret of Multi-Player Bandits

Qingsong Liu; Zhixuan Fang

Conference ProceedingsOPEN ACCESS

Decentralized Scheduling with QoS Constraints: Achieving O(1) QoS Regret of Multi-Player Bandits

Proceedings of the AAAI Conference on Artificial Intelligence (2024) 38(12) 13981-13989

DOI: 10.1609/aaai.v38i12.29306

3Citations

8Readers

Abstract

We consider a decentralized multi-player multi-armed bandit (MP-MAB) problem where players cannot observe the actions and rewards of other players and no explicit communication or coordination between players is possible. Prior studies mostly focus on maximizing the sum of rewards of the players over time. However, the total reward maximization learning may lead to imbalanced reward among players, leading to poor Quality of Service (QoS) for some players. In contrast, our objective is to let each player n achieve a predetermined expected average reward over time, i.e., achieving a predetermined level of QoS. We develop a novel decentralized MP-MAB algorithm to accomplish this objective by leveraging the methodology of randomized matching. We prove that our decentralized algorithm can ensure that all players have an O(1) QoS regret. We also reveal an analog between our MP-MAB model and the online wireless queuing systems, which builds a connection between QoS in MP-MAB learning and stability in queuing theory.

Cite

CITATION STYLE

APA

Liu, Q., & Fang, Z. (2024). Decentralized Scheduling with QoS Constraints: Achieving O(1) QoS Regret of Multi-Player Bandits. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 38, pp. 13981–13989). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v38i12.29306

Decentralized Scheduling with QoS Constraints: Achieving O(1) QoS Regret of Multi-Player Bandits

Abstract

Cite

Register to see more suggestions