NMN-VD: A neural module network for visual dialog

Yeongsu Cho; Incheol Kim

Journal ArticleOPEN ACCESS

NMN-VD: A neural module network for visual dialog

Sensors (Switzerland) (2021) 21(3) 1-18

DOI: 10.3390/s21030931

5Citations

8Readers

Abstract

Visual dialog demonstrates several important aspects of multimodal artificial intelligence; however, it is hindered by visual grounding and visual coreference resolution problems. To over-come these problems, we propose the novel neural module network for visual dialog (NMN-VD). NMN-VD is an efficient question-customized modular network model that combines only the mod-ules required for deciding answers after analyzing input questions. In particular, the model includes a Refer module that effectively finds the visual area indicated by a pronoun using a reference pool to solve a visual coreference resolution problem, which is an important challenge in visual dialog. In addition, the proposed NMN-VD model includes a method for distinguishing and handling im-personal pronouns that do not require visual coreference resolution from general pronouns. Fur-thermore, a new Compare module that effectively handles comparison questions found in visual dialogs is included in the model, as well as a Find module that applies a triple-attention mechanism to solve visual grounding problems between the question and the image. The results of various experiments conducted using a set of large-scale benchmark data verify the efficacy and high performance of our proposed NMN-VD model.

Author supplied keywords

Cite

CITATION STYLE

APA

Cho, Y., & Kim, I. (2021). NMN-VD: A neural module network for visual dialog. Sensors (Switzerland), 21(3), 1–18. https://doi.org/10.3390/s21030931

NMN-VD: A neural module network for visual dialog

Abstract

Author supplied keywords

Cite

Register to see more suggestions