IceSum: An Icelandic Text Summarization Corpus

3Citations
Citations of this article
51Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Automatic Text Summarization (ATS) is the task of generating concise and fluent summaries from one or more documents. In this paper, we present IceSum, the first Icelandic corpus annotated with human-generated summaries. IceSum consists of 1,000 online news articles and their extractive summaries. We train and evaluate several neural network-based models on this dataset, comparing them against a selection of baseline methods. The best model obtains a ROUGE-2 recall score of 71.06, outperforming all baseline methods. Furthermore, we evaluate how the amount of training data affects the quality of the generated summaries. Our results show that while the corpus is sufficiently large to train a well-performing model, there could still be significant gains from increasing the size of the training set. We release the corpus and the models with an open license.

Cite

CITATION STYLE

APA

Daðason, J. F., Loftsson, H., Sigurðardóttir, S. L., & Björnsson, Þ. (2021). IceSum: An Icelandic Text Summarization Corpus. In NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Student Research Workshop (pp. 9–14). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.naacl-srw.2

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free