Infinite horizon multi-armed bandits with reward vectors: Exploration/exploitation trade-off

0Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We focus on the effect of the exploration/exploitation tradeoff strategies on the algorithmic design off multi-armed bandits (MAB) with reward vectors. Pareto dominance relation assesses the quality of reward vectors in infinite horizon MABs, like the UCB1 and UCB2 algorithms. In single objective MABs, there is a trade-off between the exploration of the suboptimal arms, and exploitation of a single optimal arm. Pareto dominance based MABs fairly exploit all Pareto optimal arms, and explore suboptimal arms. We study the exploration vs exploitation trade-off for two UCB like algorithms for reward vectors. We analyse the properties of the proposed MAB algorithms in terms of upper regret bounds and we experimentally compare their exploration vs exploitation trade-off on a bi-objective Bernoulli environment coming from control theory.

Cite

CITATION STYLE

APA

Drugan, M. M. (2015). Infinite horizon multi-armed bandits with reward vectors: Exploration/exploitation trade-off. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9494, pp. 128–144). Springer Verlag. https://doi.org/10.1007/978-3-319-27947-3_7

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free