Universal reinforcement learning algorithms: Survey and experiments

10Citations
Citations of this article
140Readers
Mendeley users who have this article in their library.

Abstract

Many state-of-the-art reinforcement learning (RL) algorithms typically assume that the environment is an ergodic Markov Decision Process (MDP). In contrast, the field of universal reinforcement learning (URL) is concerned with algorithms that make as few assumptions as possible about the environment. The universal Bayesian agent AIXI and a family of related URL algorithms have been developed in this setting. While numerous theoretical optimality results have been proven for these agents, there has been no empirical investigation of their behavior to date. We present a short and accessible survey of these URL algorithms under a unified notation and framework, along with results of some experiments that qualitatively illustrate some properties of the resulting policies, and their relative performance on partially-observable gridworld environments. We also present an opensource reference implementation of the algorithms which we hope will facilitate further understanding of, and experimentation with, these ideas.

Cite

CITATION STYLE

APA

Aslanides, J., Leikez, J., & Hutter, M. (2017). Universal reinforcement learning algorithms: Survey and experiments. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 0, pp. 1403–1410). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2017/194

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free