Multi-question learning for visual question answering

8Citations
Citations of this article
27Readers
Mendeley users who have this article in their library.

Abstract

Visual Question Answering (VQA) raises a great challenge for computer vision and natural language processing communities. Most of the existing approaches consider videoquestion pairs individually during training. However, we observe that there are usually multiple (either sequentially generated or not) questions for the target video in a VQA task, and the questions themselves have abundant semantic relations. To explore these relations, we propose a new paradigm for VQA termed Multi-Question Learning (MQL). Inspired by the multi-task learning, MQL learns from multiple questions jointly together with their corresponding answers for a target video sequence. The learned representations of videoquestion pairs are then more general to be transferred for new questions. We further propose an effective VQA framework and design a training procedure for MQL, where the specifically designed attention network models the relation between input video and corresponding questions, enabling multiple video-question pairs to be co-trained. Experimental results on public datasets show the favorable performance of the proposed MQL-VQA framework compared to state-of-the-arts.

Cite

CITATION STYLE

APA

Lei, C., Wu, L., Liu, D., Li, Z., Wang, G., Tang, H., & Li, H. (2020). Multi-question learning for visual question answering. In AAAI 2020 - 34th AAAI Conference on Artificial Intelligence (pp. 11328–11335). AAAI press. https://doi.org/10.1609/aaai.v34i07.6794

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free