An open corpus of everyday documents for simplification tasks

David Pellow; Maxine Eskenazi

Conference ProceedingsOPEN ACCESS

An open corpus of everyday documents for simplification tasks

Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations, PITR 2014 at the 14th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2014 (2014) 84-93

DOI: 10.3115/v1/w14-1210

18Citations

78Readers

Abstract

In recent years interest in creating statistical automated text simplification systems has increased. Many of these systems have used parallel corpora of articles taken from Wikipedia and Simple Wikipedia or from Simple Wikipedia revision histories and generate Simple Wikipedia articles. In this work we motivate the need to construct a large, accessible corpus of everyday documents along with their simplifications for the development and evaluation of simplification systems that make everyday documents more accessible. We present a detailed description of what this corpus will look like and the basic corpus of everyday documents we have already collected. This latter contains everyday documents from many domains including driver's licensing, government aid and banking. It contains a total of over 120,000 sentences. We describe our preliminary work evaluating the feasibility of using crowdsourcing to generate simplifications for these documents. This is the basis for our future extended corpus which will be available to the community of researchers interested in simplification of everyday documents.

Cite

CITATION STYLE

APA

Pellow, D., & Eskenazi, M. (2014). An open corpus of everyday documents for simplification tasks. In Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations, PITR 2014 at the 14th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2014 (pp. 84–93). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/w14-1210

An open corpus of everyday documents for simplification tasks

Abstract

Cite

Register to see more suggestions