Efficient sample Reuse in EM-based policy search

12Citations
Citations of this article
21Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Direct policy search is a promising reinforcement learning framework in particular for controlling in continuous, high-dimensional systems such as anthropomorphic robots. Policy search often requires a large number of samples for obtaining a stable policy update estimator due to its high flexibility. However, this is prohibitive when the sampling cost is expensive. In this paper, we extend an EM-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, called Reward-weighted Regression with sample Reuse (R3), is demonstrated through a robot learning experiment. © 2009 Springer.

Cite

CITATION STYLE

APA

Hachiya, H., Peters, J., & Sugiyama, M. (2009). Efficient sample Reuse in EM-based policy search. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5781 LNAI, pp. 469–484). https://doi.org/10.1007/978-3-642-04180-8_48

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free