Ask Your Neurons: A Deep Learning Approach to Visual Question Answering

Mateusz Malinowski; Marcus Rohrbach; Mario Fritz

Journal Article

Ask Your Neurons: A Deep Learning Approach to Visual Question Answering

International Journal of Computer Vision (2017) 125(1-3) 110-135

DOI: 10.1007/s11263-017-1038-2

60Citations

111Readers

Get full text

Abstract

We propose a Deep Learning approach to the visual question answering task, where machines answer to questions about real-world images. By combining latest advances in image representation and natural language processing, we propose Ask Your Neurons, a scalable, jointly trained, end-to-end formulation to this problem. In contrast to previous efforts, we are facing a multi-modal problem where the language output (answer) is conditioned on visual and natural language inputs (image and question). We evaluate our approaches on the DAQUAR as well as the VQA dataset where we also report various baselines, including an analysis how much information is contained in the language part only. To study human consensus, we propose two novel metrics and collect additional answers which extend the original DAQUAR dataset to DAQUAR-Consensus. Finally, we evaluate a rich set of design choices how to encode, combine and decode information in our proposed Deep Learning formulation.

Author supplied keywords

Cite

CITATION STYLE

APA

Malinowski, M., Rohrbach, M., & Fritz, M. (2017). Ask Your Neurons: A Deep Learning Approach to Visual Question Answering. International Journal of Computer Vision, 125(1–3), 110–135. https://doi.org/10.1007/s11263-017-1038-2

Ask Your Neurons: A Deep Learning Approach to Visual Question Answering

Abstract

Author supplied keywords

Cite

Register to see more suggestions