How well do you know your summarization datasets?

16Citations
Citations of this article
64Readers
Mendeley users who have this article in their library.

Abstract

State-of-the-art summarization systems are trained and evaluated on massive datasets scraped from the web. Despite their prevalence, we know very little about the underlying characteristics (data noise, summarization complexity, etc.) of these datasets, and how these affect system performance and the reliability of automatic metrics like ROUGE. In this study, we manually analyse 600 samples from three popular summarization datasets. Our study is driven by a six-class typology which captures different noise types (missing facts, entities) and degrees of summarization difficulty (extractive, abstractive). We follow with a thorough analysis of 27 state-of-the-art summarization models and 5 popular metrics, and report our key insights: (1) Datasets have distinct data quality and complexity distributions, which can be traced back to their collection process. (2) The performance of models and reliability of metrics is dependent on sample complexity. (3) Faithful summaries often receive low scores because of the poor diversity of references. We release the code, annotated data and model outputs.

Cite

CITATION STYLE

APA

Tejaswin, P., Naik, D., & Liu, P. (2021). How well do you know your summarization datasets? In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (pp. 3436–3449). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.findings-acl.303

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free