Look and Answer the Question: On the Role of Vision in Embodied Question Answering

Nikolai Ilinykh; Yasmeen Emampoor; Simon Dobnik

Conference ProceedingsOPEN ACCESS

Look and Answer the Question: On the Role of Vision in Embodied Question Answering

15th International Natural Language Generation Conference, INLG 2022 (2022) 236-245

DOI: 10.18653/v1/2022.inlg-main.19

4Citations

17Readers

Abstract

We focus on the Embodied Question Answering (EQA) task, the dataset and the models (Das et al., 2018). In particular, we examine the effects of vision perturbation at different levels by providing the model with either incongruent, black or random noise images. We observe that the model is still able to learn from general visual patterns, suggesting that they capture some common sense reasoning about the visual world. We argue that a better set of data and models are required to achieve better performance in predicting (generating) correct answers. The code is available here: https://github.com/GU-CLASP/embodied-qa.

Cite

CITATION STYLE

APA

Ilinykh, N., Emampoor, Y., & Dobnik, S. (2022). Look and Answer the Question: On the Role of Vision in Embodied Question Answering. In 15th International Natural Language Generation Conference, INLG 2022 (pp. 236–245). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.inlg-main.19

Look and Answer the Question: On the Role of Vision in Embodied Question Answering

Abstract

Cite

Register to see more suggestions