Multi-Document Summarization with Centroid-Based Pretraining

4Citations
Citations of this article
14Readers
Mendeley users who have this article in their library.

Abstract

In Multi-Document Summarization (MDS), the input can be modeled as a set of documents, and the output is its summary. In this paper, we focus on pretraining objectives for MDS. Specifically, we introduce a novel pretraining objective, which involves selecting the ROUGE-based centroid of each document cluster as a proxy for its summary. Our objective thus does not require human written summaries and can be utilized for pretraining on a dataset consisting solely of document sets. Through zero-shot, few-shot, and fully supervised experiments on multiple MDS datasets, we show that our model Centrum is better or comparable to a state-of-the-art model. We make the pretrained and finetuned models freely available to the research community1

Cite

CITATION STYLE

APA

Puduppully, R., Jain, P., Chen, N. F., & Steedman, M. (2023). Multi-Document Summarization with Centroid-Based Pretraining. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 2, pp. 128–138). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.acl-short.13

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free