EM-based policy hyper parameter exploration: application to standing and balancing of a two-wheeled smartphone robot

7Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

This paper proposes a novel policy search algorithm called EM-based Policy Hyper Parameter Exploration (EPHE) which integrates two reinforcement learning algorithms: Policy Gradient with Parameter Exploration (PGPE) and EM-based Reward-Weighted Regression. Like PGPE, EPHE evaluates a deterministic policy in each episode with the policy parameters sampled from a prior distribution given by the policy hyper parameters (mean and variance). Based on EM-based Reward-Weighted Regression, the policy hyper parameters are updated by reward-weighted averaging so that gradient calculation and tuning of the learning rate are not required. The proposed method is tested in the benchmarks of pendulum swing-up task, cart-pole balancing task and simulation of standing and balancing of a two-wheeled smartphone robot. Experimental results show that EPHE can achieve efficient learning without learning rate tuning even for a task with discontinuities.

Cite

CITATION STYLE

APA

Wang, J., Uchibe, E., & Doya, K. (2016). EM-based policy hyper parameter exploration: application to standing and balancing of a two-wheeled smartphone robot. Artificial Life and Robotics, 21(1), 125–131. https://doi.org/10.1007/s10015-015-0260-7

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free