Guiding interaction behaviors for multi-modal grounded language learning

11Citations
Citations of this article
80Readers
Mendeley users who have this article in their library.

Abstract

Multi-modal grounded language learning connects language predicates to physical properties of objects in the world. Sensing with multiple modalities, such as audio, haptics, and visual colors and shapes while performing interaction behaviors like lifting, dropping, and looking on objects enables a robot to ground non-visual predicates like "empty" as well as visual predicates like "red". Previous work has established that grounding in multi-modal space improves performance on object retrieval from human descriptions. In this work, we gather behavior annotations from humans and demonstrate that these improve language grounding performance by allowing a system to focus on relevant behaviors for words like "white" or "halffull" that can be understood by looking or lifting, respectively. We also explore adding modality annotations (whether to focus on audio or haptics when performing a behavior), which improves performance, and sharing information between linguistically related predicates (if "green" is a color, "white" is a color), which improves grounding recall but at the cost of precision.

Cite

CITATION STYLE

APA

Thomason, J., Sinapov, J., & Mooney, R. J. (2017). Guiding interaction behaviors for multi-modal grounded language learning. In Proceedings of the 1st Workshop on Language Grounding for Robotics, RoboNLP 2017 at the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017 (pp. 20–24). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w17-2803

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free