Explicit Object Relation Alignment for Vision and Language Navigation

8Citations
Citations of this article
31Readers
Mendeley users who have this article in their library.

Abstract

In this paper, we investigate the problem of vision and language navigation. To solve this problem, grounding the landmarks and spatial relations in the textual instructions into visual modality is important. We propose a neural agent named Explicit Object Relation Alignment Agent (EXOR), to explicitly align the spatial information in both instruction and the visual environment, including landmarks and spatial relationships between the agent and landmarks. Empirically, our proposed method surpasses the baseline by a large margin on the R2R dataset. We provide a comprehensive analysis to show our model’s spatial reasoning ability and explainability.

Cite

CITATION STYLE

APA

Zhang, Y., & Kordjamshidi, P. (2022). Explicit Object Relation Alignment for Vision and Language Navigation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 322–331). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.acl-srw.24

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free