MINWIKISPLIT: A sentence splitting corpus with minimal propositions

9Citations
Citations of this article
68Readers
Mendeley users who have this article in their library.

Abstract

We compiled a new sentence splitting corpus that is composed of 203K pairs of aligned complex source and simplified target sentences. Contrary to previously proposed text simplification corpora, which contain only a small number of split examples, we present a dataset where each input sentence is broken down into a set of minimal propositions, i.e. a sequence of sound, self-contained utterances with each of them presenting a minimal semantic unit that cannot be further decomposed into meaningful propositions. This corpus is useful for developing sentence splitting approaches that learn how to transform sentences with a complex linguistic structure into a fine-grained representation of short sentences that present a simple and more regular structure which is easier to process for downstream applications and thus facilitates and improves their performance.

Cite

CITATION STYLE

APA

Niklaus, C., Freitas, A., & Handschuh, S. (2019). MINWIKISPLIT: A sentence splitting corpus with minimal propositions. In INLG 2019 - 12th International Conference on Natural Language Generation, Proceedings of the Conference (pp. 118–123). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w19-8615

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free