VQA-E: Explaining, elaborating, and enhancing your answers for visual questions

Qing Li; Qingyi Tao; Shafiq Joty; Jianfei Cai; Jiebo Luo

Conference ProceedingsOPEN ACCESS

VQA-E: Explaining, elaborating, and enhancing your answers for visual questions

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 11211 LNCS 570-586

DOI: 10.1007/978-3-030-01234-2_34

10Citations

150Readers

Abstract

Most existing works in visual question answering (VQA) are dedicated to improving the accuracy of predicted answers, while disregarding the explanations. We argue that the explanation for an answer is of the same or even more importance compared with the answer itself, since it makes the question answering process more understandable and traceable. To this end, we propose a new task of VQA-E (VQA with Explanation), where the models are required to generate an explanation with the predicted answer. We first construct a new dataset, and then frame the VQA-E problem in a multi-task learning architecture. Our VQA-E dataset is automatically derived from the VQA v2 dataset by intelligently exploiting the available captions. We also conduct a user study to validate the quality of the synthesized explanations. We quantitatively show that the additional supervision from explanations can not only produce insightful textual sentences to justify the answers, but also improve the performance of answer prediction. Our model outperforms the state-of-the-art methods by a clear margin on the VQA v2 dataset.

Author supplied keywords

Cite

CITATION STYLE

APA

Li, Q., Tao, Q., Joty, S., Cai, J., & Luo, J. (2018). VQA-E: Explaining, elaborating, and enhancing your answers for visual questions. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11211 LNCS, pp. 570–586). Springer Verlag. https://doi.org/10.1007/978-3-030-01234-2_34

VQA-E: Explaining, elaborating, and enhancing your answers for visual questions

Abstract

Author supplied keywords

Cite

Register to see more suggestions