Efficient sample Reuse in EM-based policy search

Hirotaka Hachiya; Jan Peters; Masashi Sugiyama

Conference Proceedings

Efficient sample Reuse in EM-based policy search

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2009) 5781 LNAI(PART 1) 469-484

DOI: 10.1007/978-3-642-04180-8_48

12Citations

21Readers

Get full text

Abstract

Direct policy search is a promising reinforcement learning framework in particular for controlling in continuous, high-dimensional systems such as anthropomorphic robots. Policy search often requires a large number of samples for obtaining a stable policy update estimator due to its high flexibility. However, this is prohibitive when the sampling cost is expensive. In this paper, we extend an EM-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, called Reward-weighted Regression with sample Reuse (R3), is demonstrated through a robot learning experiment. © 2009 Springer.

Cite

CITATION STYLE

APA

Hachiya, H., Peters, J., & Sugiyama, M. (2009). Efficient sample Reuse in EM-based policy search. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5781 LNAI, pp. 469–484). https://doi.org/10.1007/978-3-642-04180-8_48

Efficient sample Reuse in EM-based policy search

Abstract

Cite

Register to see more suggestions