A novel evaluation methodology for assessing off-policy learning methods in contextual bandits

Negar Hassanpour; Russell Greiner

Conference Proceedings

A novel evaluation methodology for assessing off-policy learning methods in contextual bandits

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 10832 LNAI 31-44

DOI: 10.1007/978-3-319-89656-4_3

3Citations

8Readers

Get full text

Abstract

We propose a novel evaluation methodology for assessing off-policy learning methods in contextual bandits. In particular, we provide a way to use data from any given Randomized Control Trial (RCT) to generate a range of observational studies with synthesized “outcome functions” that can match the user’s specified degrees of sample selection bias, which can then be used to comprehensively assess a given learning method. This is especially important in evaluating methods developed for precision medicine, where deploying a bad policy can have devastating effects. As the outcome function specifies the real-valued quality of any treatment for any instance, we can accurately compute the quality of any proposed treatment policy. This paper uses this evaluation methodology to establish a common ground for comparing the robustness and performance of the available off-policy learning methods in the literature.

Author supplied keywords

Cite

CITATION STYLE

APA

Hassanpour, N., & Greiner, R. (2018). A novel evaluation methodology for assessing off-policy learning methods in contextual bandits. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10832 LNAI, pp. 31–44). Springer Verlag. https://doi.org/10.1007/978-3-319-89656-4_3

A novel evaluation methodology for assessing off-policy learning methods in contextual bandits

Abstract

Author supplied keywords

Cite

Register to see more suggestions