Lightweight Visual Question Answering using Scene Graphs

18Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Visual question answering (VQA) is a challenging problem in machine perception, which requires a deep joint understanding of both visual and textual data. Recent research has advanced the automatic generation of high-quality scene graphs from images, while powerful yet elegant models like graph neural networks (GNNs) have shown great power in reasoning over graph-structured data. In this work, we propose to bridge the gap between scene graph generation and VQA by leveraging GNNs. In particular, we design a new model called Conditional Enhanced Graph ATtention network (CE-GAT) to encode pairs of visual and semantic scene graphs with both node and edge features, which is seamlessly integrated with a textual question encoder to generate answers through question-graph conditioning. Moreover, to alleviate the training difficulties of CE-GAT towards VQA, we enforce more useful inductive biases in the scene graphs through novel question-guided graph enriching and pruning. Finally, we evaluate the framework on one of the largest available VQA datasets (namely, GQA) with ground-truth scene graphs, achieving the accuracy of 77.87%, compared with the state of the art (namely, the neural state machine (NSM)), which gives 63.17%. Notably, by leveraging existing scene graphs, our framework is much lighter compared with end-to-end VQA methods (e.g., about 95.3% less parameters than a typical NSM).

Cite

CITATION STYLE

APA

Nuthalapati, S. V., Chandradevan, R., Giunchiglia, E., Li, B., Kayser, M., Lukasiewicz, T., & Yang, C. (2021). Lightweight Visual Question Answering using Scene Graphs. In International Conference on Information and Knowledge Management, Proceedings (pp. 3353–3357). Association for Computing Machinery. https://doi.org/10.1145/3459637.3482218

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free