MLASK: Multimodal Summarization of Video-based News Articles

20Citations
Citations of this article
26Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In recent years, the pattern of news consumption has been changing. The most popular multimedia news formats are now multimodal – the reader is often presented not only with a textual article but also with a short, vivid video. To draw the attention of the reader, such video-based articles are usually presented as a short textual summary paired with an image thumbnail. In this paper, we introduce MLASK1 (MultimodaL Article Summarization Kit) – a new dataset of video-based news articles paired with a textual summary and a cover picture, all obtained by automatically crawling several news websites. We demonstrate how the proposed dataset can be used to model the task of multimodal summarization by training a Transformer-based neural model. We also examine the effects of pre-training when the usage of generative pre-trained language models helps to improve the model performance, but (additional) pre-training on the simpler task of text summarization yields even better results. Our experiments suggest that the benefits of pre-training and using additional modalities in the input are not orthogonal.

Cite

CITATION STYLE

APA

Krubiński, M., & Pecina, P. (2023). MLASK: Multimodal Summarization of Video-based News Articles. In EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2023 (pp. 880–894). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-eacl.67

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free