Tool use requires reasoning about the fit between an object’s affordances and the demands of a task. Visual affordance learning can benefit from goal-directed interaction experience, but current techniques rely on human labels or expert demonstrations to generate this data. In this paper, we describe a method that grounds affordances in physical interactions instead, thus removing the need for human labels or expert policies. We use an efficient sampling-based method to generate successful trajectories that provide contact data, which are then used to reveal affordance representations. Our framework, GIFT, operates in two phases: first, we discover visual affordances from goal-directed interaction with a set of procedurally generated tools; second, we train a model to predict new instances of the discovered affordances on novel tools in a self-supervised fashion. In our experiments, we show that GIFT can leverage a sparse keypoint representation to predict grasp and interaction points to accommodate multiple tasks, such as hooking, reaching, and hammering. GIFT outperforms baselines on all tasks and matches a human oracle on two of three tasks using novel tools. Qualitative results available at: www.pair.toronto.edu/gift-tools-rss21.
CITATION STYLE
Turpin, D., Wang, L., Tsogkas, S., Dickinson, S., & Garg, A. (2021). GIFT: Generalizable Interaction-aware Functional Tool Affordances without Labels. In Robotics: Science and Systems. MIT Press Journals. https://doi.org/10.15607/RSS.2021.XVII.060
Mendeley helps you to discover research relevant for your work.