Multistage fusion with forget gate for multimodal summarization in open-domain videos

Nayu Liu; Xian Sun; Hongfeng Yu; Wenkai Zhang; Guangluan Xu

Conference ProceedingsOPEN ACCESS

Multistage fusion with forget gate for multimodal summarization in open-domain videos

EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (2020) 1834-1845

DOI: 10.18653/v1/2020.emnlp-main.144

61Citations

90Readers

Abstract

Multimodal summarization for open-domain videos is an emerging task, aiming to generate a summary from multisource information (video, audio, transcript). Despite the success of recent multiencoder-decoder frameworks on this task, existing methods lack fine-grained multimodality interactions of multisource inputs. Besides, unlike other multimodal tasks, this task has longer multimodal sequences with more redundancy and noise. To address these two issues, we propose a multistage fusion network with the fusion forget gate module, which builds upon this approach by modeling fine-grained interactions between the multisource modalities through a multistep fusion schema and controlling the flow of redundant information between multimodal long sequences via a forgetting module. Experimental results on the How2 dataset show that our proposed model achieves a new state-of-the-art performance. Comprehensive analysis empirically verifies the effectiveness of our fusion schema and forgetting module on multiple encoder-decoder architectures. Specially, when using high noise ASR transcripts (WER>30%), our model still achieves performance close to the ground-truth transcript model, which reduces manual annotation cost.

Cite

CITATION STYLE

APA

Liu, N., Sun, X., Yu, H., Zhang, W., & Xu, G. (2020). Multistage fusion with forget gate for multimodal summarization in open-domain videos. In EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 1834–1845). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.emnlp-main.144

Multistage fusion with forget gate for multimodal summarization in open-domain videos

Abstract

Cite

Register to see more suggestions