Abstract
In this paper, we study the problem of stochastic linear bandits with finite action sets. Most of existing work assume the payoffs are bounded or sub-Gaussian, which may be violated in some scenarios such as financial markets. To settle this issue, we analyze the linear bandits with heavy-tailed payoffs, where the payoffs admit finite 1 + ? moments for some ? ? (0, 1]. Through median of means and dynamic truncation, we propose two novel algorithms which enjoy a sublinear regret bound of Oe(d1 2 T 1+ 1 ? ), where d is the dimension of contextual information and T is the time horizon. Meanwhile, we provide an ?(d1+ T 1+ 1 ? ) lower bound, which implies our upper bound matches the lower bound up to polylogarithmic factors in the order of d and T when ? = 1. Finally, we conduct numerical experiments to demonstrate the effectiveness of our algorithms and the empirical results strongly support our theoretical guarantees.
Cite
CITATION STYLE
Xue, B., Wang, G., Wang, Y., & Zhang, L. (2020). Nearly optimal regret for stochastic linear bandits with heavy-tailed payoffs. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2021-January, pp. 2936–2942). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2020/406
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.