A near-optimal change-detection based algorithm for piecewise-stationary combinatorial semi-bandits

15Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.

Abstract

We investigate the piecewise-stationary combinatorial semi-bandit problem. Compared to the original combinatorial semi-bandit problem, our setting assumes the reward distributions of base arms may change in a piecewise-stationary manner at unknown time steps. We propose an algorithm, GLR-CUCB, which incorporates an efficient combinatorial semi-bandit algorithm, CUCB, with an almost parameter-free change-point detector, the Generalized Likelihood Ratio Test (GLRT). Our analysis shows that the regret of GLR-CUCB is upper bounded by O(√NKT log T), where N is the number of piecewise-stationary segments, K is the number of base arms, and T is the number of time steps. As a complement, we also derive a nearly matching regret lower bound on the order of Ω(√NKT), for both piecewise-stationary multi-armed bandits and combinatorial semi-bandits, using information-theoretic techniques and judiciously constructed piecewise-stationary bandit instances. Our lower bound is tighter than the best available regret lower bound, which is Ω(√T). Numerical experiments on both synthetic and real-world datasets demonstrate the superiority of GLR-CUCB compared to other state-of-the-art algorithms.

Cite

CITATION STYLE

APA

Zhou, H., Wang, L., Varshney, L. R., & Lim, E. P. (2020). A near-optimal change-detection based algorithm for piecewise-stationary combinatorial semi-bandits. In AAAI 2020 - 34th AAAI Conference on Artificial Intelligence (pp. 6933–6940). AAAI press. https://doi.org/10.1609/aaai.v34i04.6176

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free