CLEAR: Improving Vision-Language Navigation with Cross-Lingual, Environment-Agnostic Representations

6Citations
Citations of this article
25Readers
Mendeley users who have this article in their library.

Abstract

Vision-and-Language Navigation (VLN) tasks require an agent to navigate through the environment based on language instructions. In this paper, we aim to solve two key challenges in this task: utilizing multilingual instructions for improved instruction-path grounding and navigating through new environments that are unseen during training. To address these challenges, first, our agent learns a shared and visually-aligned cross-lingual language representation for the three languages (English, Hindi and Telugu) in the Room-Across-Room dataset. Our language representation learning is guided by text pairs that are aligned by visual information. Second, our agent learns an environment-agnostic visual representation by maximizing the similarity between semantically-aligned image pairs (with constraints on object-matching) from different environments. Our environment agnostic visual representation can mitigate the environment bias induced by low-level visual information. Empirically, on the Room-Across-Room dataset, we show that our multi-lingual agent gets large improvements in all metrics over the strong baseline model when generalizing to unseen environments with the cross-lingual language representation and the environmentagnostic visual representation. Furthermore, we show that our learned language and visual representations can be successfully transferred to the Room-to-Room and Cooperative Visionand- Dialogue Navigation task, and present detailed qualitative and quantitative generalization and grounding analysis.

Cite

CITATION STYLE

APA

Li, J., Tan, H., & Bansal, M. (2022). CLEAR: Improving Vision-Language Navigation with Cross-Lingual, Environment-Agnostic Representations. In Findings of the Association for Computational Linguistics: NAACL 2022 - Findings (pp. 633–649). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.findings-naacl.48

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free