Question-Guided Hybrid Convolution for Visual Question Answering

Peng Gao; Hongsheng Li; Shuang Li; Pan Lu; Yikang Li; Steven C.H. Hoi; Xiaogang Wang

Conference ProceedingsOPEN ACCESS

Question-Guided Hybrid Convolution for Visual Question Answering

Gao P
Li H
Li S
et al.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 11205 LNCS 485-501

DOI: 10.1007/978-3-030-01246-5_29

13Citations

143Readers

Abstract

In this paper, we propose a novel Question-Guided Hybrid Convolution (QGHC) network for Visual Question Answering (VQA). Most state-of-the-art VQA methods fuse the high-level textual and visual features from the neural network and abandon the visual spatial information when learning multi-modal features. To address these problems, question-guided kernels generated from the input question are designed to convolute with visual features for capturing the textual and visual relationship in the early stage. The question-guided convolution can tightly couple the textual and visual information but also introduce more parameters when learning kernels. We apply the group convolution, which consists of question-independent kernels and question-dependent kernels, to reduce the parameter size and alleviate over-fitting. The hybrid convolution can generate discriminative multi-modal features with fewer parameters. The proposed approach is also complementary to existing bilinear pooling fusion and attention based VQA methods. By integrating with them, our method could further boost the performance. Experiments on VQA datasets validate the effectiveness of QGHC.

Author supplied keywords

Cite

CITATION STYLE

APA

Gao, P., Li, H., Li, S., Lu, P., Li, Y., Hoi, S. C. H., & Wang, X. (2018). Question-Guided Hybrid Convolution for Visual Question Answering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11205 LNCS, pp. 485–501). Springer Verlag. https://doi.org/10.1007/978-3-030-01246-5_29

Question-Guided Hybrid Convolution for Visual Question Answering

Abstract

Author supplied keywords

Cite

Register to see more suggestions