Value driven representation for human-in-the-loop Reinforcement Learning

Ramtin Keramati; Emma Brunskill

Conference ProceedingsOPEN ACCESS

Value driven representation for human-in-the-loop Reinforcement Learning

ACM UMAP 2019 - Proceedings of the 27th ACM Conference on User Modeling, Adaptation and Personalization (2019) 177-180

DOI: 10.1145/3320435.3320471

3Citations

25Readers

Get full text

Abstract

Interactive adaptive systems powered by Reinforcement Learning (RL) have many potential applications, such as intelligent tutoring systems. In such systems there is typically an external human system designer that is creating, monitoring and modifying the interactive adaptive system, trying to improve its performance on the target outcomes. In this paper we focus on algorithmic foundation of how to help the system designer choose the set of sensors or features to define the observation space used by reinforcement learning agent. We present an algorithm, value driven representation (VDR), that can iteratively and adaptively augment the observation space of a reinforcement learning agent so that is sufficient to capture a (near) optimal policy. To do so we introduce a new method to optimistically estimate the value of a policy using offline simulated Monte Carlo rollouts. We evaluate the performance of our approach on standard RL benchmarks with simulated humans and demonstrate significant improvement over prior baselines.

Author supplied keywords

Cite

CITATION STYLE

APA

Keramati, R., & Brunskill, E. (2019). Value driven representation for human-in-the-loop Reinforcement Learning. In ACM UMAP 2019 - Proceedings of the 27th ACM Conference on User Modeling, Adaptation and Personalization (pp. 177–180). Association for Computing Machinery, Inc. https://doi.org/10.1145/3320435.3320471

Value driven representation for human-in-the-loop Reinforcement Learning

Abstract

Author supplied keywords

Cite

Register to see more suggestions