Importance sampling for fair policy selection

5Citations
Citations of this article
80Readers
Mendeley users who have this article in their library.

Abstract

We consider the problem of off-policy policy selection in reinforcement learning: using historical data generated from running one policy to compare two or more policies. We show that approaches based on importance sampling can be unfair-they can select the worse of two policies more often than not. We then give an example that shows importance sampling is systematically unfair in a practically relevant setting; namely, we show that it unreasonably favors shorter trajectory lengths. We then present sufficient conditions to theoretically guarantee fairness. Finally, we provide a practical importance sampling-based estimator to help mitigate the unfairness due to varying trajectory lengths.

Cite

CITATION STYLE

APA

Doroudi, S., Thomas, P. S., & Brunskill, E. (2018). Importance sampling for fair policy selection. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2018-July, pp. 5239–5243). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2018/729

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free