Interactive Natural Language Grounding via Referring Expression Comprehension and Scene Graph Parsing

12Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.

Abstract

Natural language provides an intuitive and effective interaction interface between human beings and robots. Currently, multiple approaches are presented to address natural language visual grounding for human-robot interaction. However, most of the existing approaches handle the ambiguity of natural language queries and achieve target objects grounding via dialogue systems, which make the interactions cumbersome and time-consuming. In contrast, we address interactive natural language grounding without auxiliary information. Specifically, we first propose a referring expression comprehension network to ground natural referring expressions. The referring expression comprehension network excavates the visual semantics via a visual semantic-aware network, and exploits the rich linguistic contexts in expressions by a language attention network. Furthermore, we combine the referring expression comprehension network with scene graph parsing to achieve unrestricted and complicated natural language grounding. Finally, we validate the performance of the referring expression comprehension network on three public datasets, and we also evaluate the effectiveness of the interactive natural language grounding architecture by conducting extensive natural language query groundings in different household scenarios.

Cite

CITATION STYLE

APA

Mi, J., Lyu, J., Tang, S., Li, Q., & Zhang, J. (2020). Interactive Natural Language Grounding via Referring Expression Comprehension and Scene Graph Parsing. Frontiers in Neurorobotics, 14. https://doi.org/10.3389/fnbot.2020.00043

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free