Learning environmental calibration actions for policy self-evolution

Chao Zhang; Yang Yu; Zhi Hua Zhou

Conference Proceedings

Learning environmental calibration actions for policy self-evolution

IJCAI International Joint Conference on Artificial Intelligence (2018) 2018-July 3061-3067

DOI: 10.24963/ijcai.2018/425

13Citations

26Readers

Get full text

Abstract

Reinforcement learning in physical world is often expensive. Simulators are commonly employed to train policies. Due to the simulation error, trained-in-simulator policies are hard to be directly deployed in physical world. Therefore, how to efficiently reuse these policies to the real environment is a key issue. To address this issue, this paper presents a policy self-evolution process: in the target environment, the agent firstly executes a few calibration actions to perceive the environment, and then reuses the previous policies according to the observation of the environment. In this way, the mission of policy learning in the target environment is reduced to the task of environment identification through executing the calibration actions, which needs much less samples than learning a policy from scratch. We propose the POSEC (POlicy Self-Evolution by Calibration) approach, which learns the most informative calibration actions for policy self-evolution. Taking three robotic arm controlling tasks as the test beds, we show that the proposed method can learn a fine policy for a new arm with only a few (e.g. five) samples of the target environment.

Cite

CITATION STYLE

APA

Zhang, C., Yu, Y., & Zhou, Z. H. (2018). Learning environmental calibration actions for policy self-evolution. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2018-July, pp. 3061–3067). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2018/425

Learning environmental calibration actions for policy self-evolution

Abstract

Cite

Register to see more suggestions