Unsupervised and Pseudo-Supervised Vision-Language Alignment in Visual Dialog

10Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.

Abstract

Visual dialog requires models to give reasonable answers according to a series of coherent questions and related visual concepts in images. However, most current work either focuses on attention-based fusion or pre-training on large-scale image-text pairs, ignoring the critical role of explicit vision-language alignment in visual dialog. To remedy this defect, we propose a novel unsupervised and pseudo-supervised vision-language alignment approach for visual dialog (AlignVD). Firstly, AlginVD utilizes the visual and dialog encoder to represent images and dialogs. Then, it explicitly aligns visual concepts with textual semantics via unsupervised and pseudo-supervised vision-language alignment (UVLA and PVLA). Specifically, UVLA utilizes a graph autoencoder, while PVLA uses dialog-guided visual grounding to conduct alignment. Finally, based on the aligned visual and textual representations, AlignVD gives a reasonable answer to the question via the cross-modal decoder. Extensive experiments on two large-scale visual dialog datasets have demonstrated the effectiveness of vision-language alignment, and our proposed AlignVD achieves new state-of-the-art results. In addition, our single model has won first place on the visual dialog challenge leaderboard with a NDCG metric of 78.70, surpassing the previous best ensemble model by about 1 point.

Cite

CITATION STYLE

APA

Chen, F., Zhang, D., Chen, X., Shi, J., Xu, S., & Xu, B. (2022). Unsupervised and Pseudo-Supervised Vision-Language Alignment in Visual Dialog. In MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia (pp. 4142–4153). Association for Computing Machinery, Inc. https://doi.org/10.1145/3503161.3547776

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free