Look and Answer the Question: On the Role of Vision in Embodied Question Answering

4Citations
Citations of this article
17Readers
Mendeley users who have this article in their library.

Abstract

We focus on the Embodied Question Answering (EQA) task, the dataset and the models (Das et al., 2018). In particular, we examine the effects of vision perturbation at different levels by providing the model with either incongruent, black or random noise images. We observe that the model is still able to learn from general visual patterns, suggesting that they capture some common sense reasoning about the visual world. We argue that a better set of data and models are required to achieve better performance in predicting (generating) correct answers. The code is available here: https://github.com/GU-CLASP/embodied-qa.

Cite

CITATION STYLE

APA

Ilinykh, N., Emampoor, Y., & Dobnik, S. (2022). Look and Answer the Question: On the Role of Vision in Embodied Question Answering. In 15th International Natural Language Generation Conference, INLG 2022 (pp. 236–245). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.inlg-main.19

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free