GraphVQA: Language-Guided Graph Neural Networks for Scene Graph Question Answering

14Citations
Citations of this article
82Readers
Mendeley users who have this article in their library.

Abstract

Images are more than a collection of objects or attributes - they represent a web of relationships among interconnected objects. Scene Graph has emerged as a new modality as a structured graphical representation of images. Scene Graph encodes objects as nodes connected via pairwise relations as edges. To support question answering on scene graphs, we propose GraphVQA, a language-guided graph neural network framework that translates and executes a natural language question as multiple iterations of message passing among graph nodes. We explore the design space of GraphVQA framework, and discuss the trade-off of different design choices. Our experiments on GQA dataset show that GraphVQA outperforms the state-of-the-art model by a large margin (88.43% vs. 94.78%). Our code is available at https://github.com/codexxxl/GraphVQA.

Cite

CITATION STYLE

APA

Liang, W., Jiang, Y., & Liu, Z. (2021). GraphVQA: Language-Guided Graph Neural Networks for Scene Graph Question Answering. In Multimodal Artificial Intelligence, MAI Workshop 2021 - Proceedings of the 3rd Workshop (pp. 79–86). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.maiworkshop-1.12

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free