Learning environmental calibration actions for policy self-evolution

13Citations
Citations of this article
26Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Reinforcement learning in physical world is often expensive. Simulators are commonly employed to train policies. Due to the simulation error, trained-in-simulator policies are hard to be directly deployed in physical world. Therefore, how to efficiently reuse these policies to the real environment is a key issue. To address this issue, this paper presents a policy self-evolution process: in the target environment, the agent firstly executes a few calibration actions to perceive the environment, and then reuses the previous policies according to the observation of the environment. In this way, the mission of policy learning in the target environment is reduced to the task of environment identification through executing the calibration actions, which needs much less samples than learning a policy from scratch. We propose the POSEC (POlicy Self-Evolution by Calibration) approach, which learns the most informative calibration actions for policy self-evolution. Taking three robotic arm controlling tasks as the test beds, we show that the proposed method can learn a fine policy for a new arm with only a few (e.g. five) samples of the target environment.

Cite

CITATION STYLE

APA

Zhang, C., Yu, Y., & Zhou, Z. H. (2018). Learning environmental calibration actions for policy self-evolution. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2018-July, pp. 3061–3067). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2018/425

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free