Evaluating the robustness of off-policy evaluation

Yuta Saito; Takuma Udagawa; Haruka Kiyohara; Kazuki Mogi; Yusuke Narita; Kei Tateno

Conference ProceedingsOPEN ACCESS

Evaluating the robustness of off-policy evaluation

RecSys 2021 - 15th ACM Conference on Recommender Systems (2021) 114-123

DOI: 10.1145/3460231.3474245

21Citations

41Readers

Get full text

Abstract

Off-policy Evaluation (OPE), or offline evaluation in general, evaluates the performance of hypothetical policies leveraging only offline log data. It is particularly useful in applications where the online interaction involves high stakes and expensive setting such as precision medicine and recommender systems. Since many OPE estimators have been proposed and some of them have hyperparameters to be tuned, there is an emerging challenge for practitioners to select and tune OPE estimators for their specific application. Unfortunately, identifying a reliable estimator from results reported in research papers is often difficult because the current experimental procedure evaluates and compares the estimators' performance on a narrow set of hyperparameters and evaluation policies. Therefore, it is difficult to know which estimator is safe and reliable to use. In this work, we develop Interpretable Evaluation for Offline Evaluation (IEOE), an experimental procedure to evaluate OPE estimators' robustness to changes in hyperparameters and/or evaluation policies in an interpretable manner. Then, using the IEOE procedure, we perform extensive evaluation of a wide variety of existing estimators on Open Bandit Dataset, a large-scale public real-world dataset for OPE. We demonstrate that our procedure can evaluate the estimators' robustness to the hyperparamter choice, helping us avoid using unsafe estimators. Finally, we apply IEOE to real-world e-commerce platform data and demonstrate how to use our protocol in practice.

Author supplied keywords

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Saito, Y., Udagawa, T., Kiyohara, H., Mogi, K., Narita, Y., & Tateno, K. (2021). Evaluating the robustness of off-policy evaluation. In RecSys 2021 - 15th ACM Conference on Recommender Systems (pp. 114–123). Association for Computing Machinery, Inc. https://doi.org/10.1145/3460231.3474245

Readers' Seniority

PhD / Post grad / Masters / Doc 11

65%

Researcher 4

24%

Professor / Associate Prof. 1

Lecturer / Post doc 1

Readers' Discipline

Computer Science 12

75%

Engineering 2

13%

Neuroscience 1

Social Sciences 1

Article Metrics

Social Media

Shares, Likes & Comments: 1

View details >

Evaluating the robustness of off-policy evaluation

Abstract

Author supplied keywords

References Powered by Scopus

Doubly robust policy evaluation and optimization

Offline A/B testing for recommender systems

The offset tree for learning with partial labels

Cited by Powered by Scopus

Doubly robust off-policy evaluation for ranking policies under the cascade behavior model

A survey on causal inference for recommendation

Sim-GAIL: A generative adversarial imitation learning approach of student modelling for intelligent tutoring systems

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline

Article Metrics