Multimodal Sentence Summarization via Multimodal Selective Encoding

Haoran Li; Junnan Zhu; Jiajun Zhang; Chengqing Zong; Xiaodong He

Conference ProceedingsOPEN ACCESS

Multimodal Sentence Summarization via Multimodal Selective Encoding

COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference (2020) 5655-5667

DOI: 10.18653/v1/2020.coling-main.496

41Citations

79Readers

Abstract

This paper studies the problem of generating a summary for a given sentence-image pair. Existing multimodal sequence-to-sequence approaches mainly focus on enhancing the decoder by visual signals, while ignoring that the image can improve the ability of the encoder to identify highlights of a news event or a document. Thus, we propose a multimodal selective gate network that considers reciprocal relationships between textual and multi-level visual features, including global image descriptor, activation grids, and object proposals, to select highlights of the event when encoding the source sentence. In addition, we introduce a modality regularization to encourage the summary to capture the highlights embedded in the image more accurately. To verify the generalization of our model, we adopt the multimodal selective gate to the text-based decoder and multimodal-based decoder. Experimental results on a public multimodal sentence summarization dataset demonstrate the advantage of our models over baselines. Further analysis suggests that our proposed multimodal selective gate network can effectively select important information in the input sentence.

Cite

CITATION STYLE

APA

Li, H., Zhu, J., Zhang, J., Zong, C., & He, X. (2020). Multimodal Sentence Summarization via Multimodal Selective Encoding. In COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference (pp. 5655–5667). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.coling-main.496

Multimodal Sentence Summarization via Multimodal Selective Encoding

Abstract

Cite

Register to see more suggestions