An open corpus of everyday documents for simplification tasks

18Citations
Citations of this article
78Readers
Mendeley users who have this article in their library.

Abstract

In recent years interest in creating statistical automated text simplification systems has increased. Many of these systems have used parallel corpora of articles taken from Wikipedia and Simple Wikipedia or from Simple Wikipedia revision histories and generate Simple Wikipedia articles. In this work we motivate the need to construct a large, accessible corpus of everyday documents along with their simplifications for the development and evaluation of simplification systems that make everyday documents more accessible. We present a detailed description of what this corpus will look like and the basic corpus of everyday documents we have already collected. This latter contains everyday documents from many domains including driver's licensing, government aid and banking. It contains a total of over 120,000 sentences. We describe our preliminary work evaluating the feasibility of using crowdsourcing to generate simplifications for these documents. This is the basis for our future extended corpus which will be available to the community of researchers interested in simplification of everyday documents.

Cite

CITATION STYLE

APA

Pellow, D., & Eskenazi, M. (2014). An open corpus of everyday documents for simplification tasks. In Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations, PITR 2014 at the 14th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2014 (pp. 84–93). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/w14-1210

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free