Visual dialog with multi-turn attentional memory network

Dejiang Kong; Fei Wu

Conference Proceedings

Visual dialog with multi-turn attentional memory network

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 11164 LNCS 611-621

DOI: 10.1007/978-3-030-00776-8_56

2Citations

5Readers

Get full text

Abstract

Visual dialog is a task of answering a question given an input image, a historical dialog about the image and often requires to retrieve visual and textual facts about the question. This problem is different from visual question answering (VQA), which only relies on visual grounding estimated from an image and question pair, while visual dialog task requires interactions among a question, an input image and a historical dialog. Most methods rely on one-turn attention network to obtain facts w.r.t. a question. However, the information transition phenomenon which exists in these facts restricts these methods to retrieve all relevant information. In this paper, we propose a multi-turn attentional memory network for visual dialog. Firstly, we propose a attentional memory network that maintains image regions and historical dialog in two memory banks and attends the question to be answered to both the visual and textual banks to obtain multi-model facts. Further, considering the information transition phenomenon, we design a multi-turn attention architecture which attend to memory banks multiple turns to retrieve more facts in order to produce a better answer. We evaluate the proposed model in on VisDial v0.9 dataset and the experimental results prove the effectiveness of the proposed model.

Author supplied keywords

Cite

CITATION STYLE

APA

Kong, D., & Wu, F. (2018). Visual dialog with multi-turn attentional memory network. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11164 LNCS, pp. 611–621). Springer Verlag. https://doi.org/10.1007/978-3-030-00776-8_56

Visual dialog with multi-turn attentional memory network

Abstract

Author supplied keywords

Cite

Register to see more suggestions