AdaPT: Zero-Shot Adaptive Policy Transfer for Stochastic Dynamical Systems

5Citations
Citations of this article
55Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Model-free policy learning has enabled good performance on complex tasks that were previously intractable with traditional control techniques. However, this comes at the cost of requiring a perfectly accurate model for training. This is infeasible due to the very high sample complexity of model-free methods preventing training on the target system. This renders such methods unsuitable for physical systems. Model mismatch due to dynamics parameter differences and unmodeled dynamics error may cause suboptimal or unsafe behavior upon direct transfer. We introduce the Adaptive Policy Transfer for Stochastic Dynamics (AdaPT) algorithm that achieves provably safe and robust, dynamically-feasible zero-shot transfer of RL-policies to new domains with dynamics error. AdaPT combines the strengths of offline policy learning in a black-box source simulator with online tube-based MPC to attenuate bounded dynamics mismatch between the source and target dynamics. AdaPT allows online transfer of policies, trained solely in a simulation offline, to a family of unknown targets without fine-tuning. We also formally show that (i) AdaPT guarantees bounded state and control deviation through state-action tubes under relatively weak technical assumptions and, (ii) AdaPT results in a bounded loss of reward accumulation relative to a policy trained and evaluated in the source environment. We evaluate AdaPT on 2 continuous, non-holonomic simulated dynamical systems with 4 different disturbance models, and find that AdaPT performs between 50 and better on mean reward accrual than direct policy transfer.

Cite

CITATION STYLE

APA

Harrison, J., Garg, A., Ivanovic, B., Zhu, Y., Savarese, S., Fei-Fei, L., & Pavone, M. (2020). AdaPT: Zero-Shot Adaptive Policy Transfer for Stochastic Dynamical Systems. In Springer Proceedings in Advanced Robotics (Vol. 10, pp. 437–453). Springer Science and Business Media B.V. https://doi.org/10.1007/978-3-030-28619-4_34

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free