MINWIKISPLIT: A sentence splitting corpus with minimal propositions

Christina Niklaus; André Freitas; Siegfried Handschuh

Conference ProceedingsOPEN ACCESS

MINWIKISPLIT: A sentence splitting corpus with minimal propositions

INLG 2019 - 12th International Conference on Natural Language Generation, Proceedings of the Conference (2019) 118-123

DOI: 10.18653/v1/w19-8615

9Citations

68Readers

Abstract

We compiled a new sentence splitting corpus that is composed of 203K pairs of aligned complex source and simplified target sentences. Contrary to previously proposed text simplification corpora, which contain only a small number of split examples, we present a dataset where each input sentence is broken down into a set of minimal propositions, i.e. a sequence of sound, self-contained utterances with each of them presenting a minimal semantic unit that cannot be further decomposed into meaningful propositions. This corpus is useful for developing sentence splitting approaches that learn how to transform sentences with a complex linguistic structure into a fine-grained representation of short sentences that present a simple and more regular structure which is easier to process for downstream applications and thus facilitates and improves their performance.

Cite

CITATION STYLE

APA

Niklaus, C., Freitas, A., & Handschuh, S. (2019). MINWIKISPLIT: A sentence splitting corpus with minimal propositions. In INLG 2019 - 12th International Conference on Natural Language Generation, Proceedings of the Conference (pp. 118–123). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w19-8615

MINWIKISPLIT: A sentence splitting corpus with minimal propositions

Abstract

Cite

Register to see more suggestions