Towards Safe Policy Learning under Partial Identifiability: A Causal Approach

6Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.

Abstract

Learning personalized treatment policies is a formative challenge in many real-world applications, including in healthcare, econometrics, artificial intelligence. However, the effectiveness of candidate policies is not always identifiable, i.e., it is not uniquely computable from the combination of the available data and assumptions about the generating mechanisms. This paper studies policy learning from data collected in various non-identifiable settings, i.e., (1) observational studies with unobserved confounding; (2) randomized experiments with partial observability; and (3) their combinations. We derive sharp, closed-formed bounds from observational and experimental data over the conditional treatment effects. Based on these novel bounds, we further characterize the problem of safe policy learning and develop an algorithm that trains a policy from data guaranteed to achieve, at least, the performance of the baseline policy currently deployed. Finally, we validate our proposed algorithm on synthetic data and a large clinical trial, demonstrating that it guarantees safe behaviors and robust performance.

Cite

CITATION STYLE

APA

Joshi, S., Zhang, J., & Bareinboim, E. (2024). Towards Safe Policy Learning under Partial Identifiability: A Causal Approach. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 38, pp. 13004–13012). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v38i12.29198

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free