A New Dataset and Efficient Baselines for Document-level Text Simplification in German

26Citations
Citations of this article
58Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The task of document-level text simplification is very similar to summarization with the additional difficulty of reducing complexity. We introduce a newly collected data set of German texts, collected from the Swiss news magazine 20 Minuten ('20 Minutes') that consists of full articles paired with simplified summaries. Furthermore, we present experiments on ATS with the pretrained multilingual mBART and a modified version thereof that is more memory-friendly, using both our new data set and existing simplification corpora. Our modifications of mBART let us train at a lower memory cost without much loss in performance, in fact, the smaller mBART even improves over the standard model in a setting with multiple simplification levels.

Cite

CITATION STYLE

APA

Rios, A., Spring, N., Kew, T., Kostrzewa, M., Säuberli, A., Müller, M., & Ebling, S. (2021). A New Dataset and Efficient Baselines for Document-level Text Simplification in German. In 3rd Workshop on New Frontiers in Summarization, NewSum 2021 - Workshop Proceedings (pp. 152–161). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.newsum-1.16

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free