By harnessing a growing dataset of robot experience, we learn control policies for a diverse and increasing set of related manipulation tasks. To make this possible, we introduce reward sketching: an effective way of eliciting human preferences to learn the reward function for a new task. This reward function is then used to retrospectively annotate all historical data, collected for different tasks, with predicted rewards for the new task. The resulting massive annotated dataset can then be used to learn manipulation policies with batch reinforcement learning (RL) from visual input in a completely off-line way, i.e., without interactions with the real robot. This approach makes it possible to scale up RL in robotics, as we no longer need to run the robot for each step of learning. We show that the trained batch RL agents, when deployed in real robots, can perform a variety of challenging tasks involving multiple interactions among rigid or deformable objects. Moreover, they display a significant degree of robustness and generalization. In some cases, they even outperform human teleoperators.
CITATION STYLE
Cabi, S., Colmenarejo, S. G., Novikov, A., Konyushkova, K., Reed, S., Jeong, R., … Wang, Z. (2020). Scaling data-driven robotics with reward sketching and batch reinforcement learning. In Robotics: Science and Systems. MIT Press Journals. https://doi.org/10.15607/RSS.2020.XVI.076
Mendeley helps you to discover research relevant for your work.