Unifying the stochastic and the adversarial bandits with knapsack

Anshuka Rangi; Massimo Franceschetti; Long Tran-Thanh

Conference ProceedingsOPEN ACCESS

Unifying the stochastic and the adversarial bandits with knapsack

IJCAI International Joint Conference on Artificial Intelligence (2019) 2019-August 3311-3317

DOI: 10.24963/ijcai.2019/459

13Citations

25Readers

Abstract

This work investigates the adversarial Bandits with Knapsack (BwK) learning problem, where a player repeatedly chooses to perform an action, pays the corresponding cost of the action, and receives a reward associated with the action. The player is constrained by the maximum budget that can be spent to perform the actions, and the rewards and the costs of these actions are assigned by an adversary. This setting is studied in terms of expected regret, defined as the difference between the total expected rewards per unit cost corresponding the best fixed action and the total expected rewards per unit cost of the learning algorithm. We propose a novel algorithm EXP3.BwK and show that the expected regret of the algorithm is order optimal in the budget. We then propose another algorithm EXP3++.BwK, which is order optimal in the adversarial BwK setting, and incurs an almost optimal expected regret in the stochastic BwK setting where the rewards and the costs are drawn from unknown underlying distributions. These results are then extended to a more general online learning setting, by designing another algorithm EXP3++.LwK and providing its performance guarantees. Finally, we investigate the scenario where the costs of the actions are large and comparable to the budget. We show that for the adversarial setting, the achievable regret bounds scale at least linearly with the maximum cost for any learning algorithm, and are significantly worse in comparison to the case of having costs bounded by a constant, which is a common assumption in the BwK literature.

Cite

CITATION STYLE

APA

Rangi, A., Franceschetti, M., & Tran-Thanh, L. (2019). Unifying the stochastic and the adversarial bandits with knapsack. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2019-August, pp. 3311–3317). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2019/459

Unifying the stochastic and the adversarial bandits with knapsack

Abstract

Cite

Register to see more suggestions