Graph Strategy for Interpretable Visual Question Answering

1Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In the paper, we consider the task of Visual Question Answering, an important task for creating General Artificial Intelligence (AI) systems. We propose an interpretable model called GS-VQA. The main idea behind it is that a complex compositional question could be decomposed into a sequence of simple questions about objects’ properties and their relations. We use the Unified estimator to answer questions from that sequence and test the proposed model on CLEVR and THOR-VQA datasets. The GS-VQA model demonstrates results comparable to the state of the art while maintaining transparency and interpretability of the response generation process.

Cite

CITATION STYLE

APA

Sarkisyan, C., Savelov, M., Kovalev, A. K., & Panov, A. I. (2023). Graph Strategy for Interpretable Visual Question Answering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13539 LNAI, pp. 86–99). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-19907-3_9

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free