Jointly learning grounded task structures from language instruction and visual demonstration

Changsong Liu; Shaohua Yang; Sari Saba-Sadiya; Nishant Shukla; Yunzhong He; Song Chun Zhu; Joyce Y. Chai

Conference ProceedingsOPEN ACCESS

Jointly learning grounded task structures from language instruction and visual demonstration

EMNLP 2016 - Conference on Empirical Methods in Natural Language Processing, Proceedings (2016) 1482-1492

DOI: 10.18653/v1/d16-1155

29Citations

109Readers

Abstract

To enable language-based communication and collaboration with cognitive robots, this paper presents an approach where an agent can learn task models jointly from language instruction and visual demonstration using an And-Or Graph (AoG) representation. The learned AoG captures a hierarchical task structure where linguistic labels (for language communication) are grounded to corresponding state changes from the physical environment (for perception and action). Our empirical results on a cloth-folding domain have shown that, although state detection through visual processing is full of uncertainties and error prone, by a tight integration with language the agent is able to learn an effective AoG for task representation. The learned AoG can be further applied to infer and interpret on-going actions from new visual demonstration using linguistic labels at different levels of granularity.

Cite

CITATION STYLE

APA

Liu, C., Yang, S., Saba-Sadiya, S., Shukla, N., He, Y., Zhu, S. C., & Chai, J. Y. (2016). Jointly learning grounded task structures from language instruction and visual demonstration. In EMNLP 2016 - Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 1482–1492). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/d16-1155

Jointly learning grounded task structures from language instruction and visual demonstration

Abstract

Cite

Register to see more suggestions