Jointly learning grounded task structures from language instruction and visual demonstration

29Citations
Citations of this article
109Readers
Mendeley users who have this article in their library.

Abstract

To enable language-based communication and collaboration with cognitive robots, this paper presents an approach where an agent can learn task models jointly from language instruction and visual demonstration using an And-Or Graph (AoG) representation. The learned AoG captures a hierarchical task structure where linguistic labels (for language communication) are grounded to corresponding state changes from the physical environment (for perception and action). Our empirical results on a cloth-folding domain have shown that, although state detection through visual processing is full of uncertainties and error prone, by a tight integration with language the agent is able to learn an effective AoG for task representation. The learned AoG can be further applied to infer and interpret on-going actions from new visual demonstration using linguistic labels at different levels of granularity.

Cite

CITATION STYLE

APA

Liu, C., Yang, S., Saba-Sadiya, S., Shukla, N., He, Y., Zhu, S. C., & Chai, J. Y. (2016). Jointly learning grounded task structures from language instruction and visual demonstration. In EMNLP 2016 - Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 1482–1492). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/d16-1155

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free