Information-Theoretic Text Hallucination Reduction for Video-grounded Dialogue

4Citations
Citations of this article
21Readers
Mendeley users who have this article in their library.

Abstract

Video-grounded Dialogue (VGD) aims to decode an answer sentence to a question regarding a given video and dialogue context. Despite the recent success of multi-modal reasoning to generate answer sentences, existing dialogue systems still suffer from a text hallucination problem, which denotes indiscriminate text-copying from input texts without an understanding of the question. This is due to learning spurious correlations from the fact that answer sentences in the dataset usually include the words of input texts, thus the VGD system excessively relies on copying words from input texts by hoping those words to overlap with ground-truth texts. Hence, we design Text Hallucination Mitigating (THAM) framework, which incorporates Text Hallucination Regularization (THR) loss derived from the proposed information-theoretic text hallucination measurement approach. Applying THAM with current dialogue systems validates the effectiveness on VGD benchmarks (i.e., AVSD@DSTC7 and AVSD@DSTC8) and shows enhanced interpretability.

Cite

CITATION STYLE

APA

Yoon, S., Yoon, E., Yoon, H. S., Kim, J., & Yoo, C. D. (2022). Information-Theoretic Text Hallucination Reduction for Video-grounded Dialogue. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 (pp. 4182–4193). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.emnlp-main.280

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free