PILCO: A model-based and data-efficient approach to policy search

Marc Peter Deisenroth; Carl Edward Rasmussen

Conference Proceedings

PILCO: A model-based and data-efficient approach to policy search

Proceedings of the 28th International Conference on Machine Learning, ICML 2011 (2011) 465-472

1.3kCitations

1.4kReaders

Abstract

In this paper, we introduce PILCO, a practical, data-efficient model-based policy search method. PILCO reduces model bias, one of the key problems of model-based reinforcement learning, in a principled way. By learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning, PILCO can cope with very little data and facilitates learning from scratch in only a few trials. Policy evaluation is performed in closed form using state-of-the-art approximate inference. Furthermore, policy gradients are computed analytically for policy improvement. We report unprecedented learning efficiency on challenging and high-dimensional control tasks. Copyright 2011 by the author(s)/owner(s).

Cite

CITATION STYLE

APA

Deisenroth, M. P., & Rasmussen, C. E. (2011). PILCO: A model-based and data-efficient approach to policy search. In Proceedings of the 28th International Conference on Machine Learning, ICML 2011 (pp. 465–472).

PILCO: A model-based and data-efficient approach to policy search

Abstract

Cite

Register to see more suggestions