Visual Question Answering Using Semantic Information from Image Descriptions

0Citations
Citations of this article
14Readers
Mendeley users who have this article in their library.

Abstract

In this work, we propose a deep neural architecture that uses an attention mechanism which utilizes region based image features, the natural language question asked, and semantic knowledge extracted from the regions of an image to produce open-ended answers for questions asked in a visual question answering (VQA) task. The combination of both region based features and region based textual information about the image bolsters a model to more accurately respond to questions and potentially do so with less required training data. We evaluate our proposed architecture on a VQA task against a strong baseline and show that our method achieves excellent results on this task.

Cite

CITATION STYLE

APA

Tasrin, T., Al Nahian, M. S., & Harrison, B. (2021). Visual Question Answering Using Semantic Information from Image Descriptions. In Proceedings of the International Florida Artificial Intelligence Research Society Conference, FLAIRS (Vol. 34). Florida Online Journals, University of Florida. https://doi.org/10.32473/flairs.v34i1.128460

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free