Self-attention based visual dialogue

0Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We improvised the performance on the task of Visual Dialogue. We integrate a novel mechanism called self-attention to improve the results reported in the original Visual Dialogue paper. Visual Dialogue is different from other downstream tasks and serves as a universal test of machine intelligence. The model has to be adroit in both vision and language enough to allow assessment of individual answers and observe the development. The dataset used in this paper is VisDial v0.9 which is collected by Georgia Tech University. We used the same train/test splits as the original paper to estimate the result. It contains a total of approximately 1.2 million dialogue question-answer pair which has ten question-answer pairs on ~120,000 images from COCO.To keep the comparison fair and simple, we have used the encoder-decoder architecture namely Late-Fusion Encoder and Discriminative decoder. We included the self-attention module from SAGAN paper into the encoder. The inclusion self-attention module was based in the fact that a lot of answers from visual dialog model were solely based on the questions asked and not on the image. So, the hypothesis is that the self-attention module will make the model attend to the image while generating an answer.

Cite

CITATION STYLE

APA

Mathur, V., Jha, D., & Kumar, S. (2019). Self-attention based visual dialogue. International Journal of Recent Technology and Engineering, 8(3), 8792–8795. https://doi.org/10.35940/ijrte.C5306.098319

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free