A novel evaluation methodology for assessing off-policy learning methods in contextual bandits

3Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We propose a novel evaluation methodology for assessing off-policy learning methods in contextual bandits. In particular, we provide a way to use data from any given Randomized Control Trial (RCT) to generate a range of observational studies with synthesized “outcome functions” that can match the user’s specified degrees of sample selection bias, which can then be used to comprehensively assess a given learning method. This is especially important in evaluating methods developed for precision medicine, where deploying a bad policy can have devastating effects. As the outcome function specifies the real-valued quality of any treatment for any instance, we can accurately compute the quality of any proposed treatment policy. This paper uses this evaluation methodology to establish a common ground for comparing the robustness and performance of the available off-policy learning methods in the literature.

Cite

CITATION STYLE

APA

Hassanpour, N., & Greiner, R. (2018). A novel evaluation methodology for assessing off-policy learning methods in contextual bandits. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10832 LNAI, pp. 31–44). Springer Verlag. https://doi.org/10.1007/978-3-319-89656-4_3

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free