An analysis of crowdsourced text simplifications

Marcelo Adriano Amancio; Lucia Specia

Conference ProceedingsOPEN ACCESS

An analysis of crowdsourced text simplifications

Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations, PITR 2014 at the 14th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2014 (2014) 123-130

DOI: 10.3115/v1/w14-1214

21Citations

77Readers

Abstract

We present a study on the text simplification operations undertaken collaboratively by Simple English Wikipedia contributors. The aim is to understand whether a complex-simple parallel corpus involving this version of Wikipedia is appropriate as data source to induce simplification rules, and whether we can automatically categorise the different operations performed by humans. A subset of the corpus was first manually analysed to identify its transformation operations. We then built machine learning models to attempt to automatically classify segments based on such transformations. This classification could be used, e.g., to filter out potentially noisy transformations. Our results show that the most common transformation operations performed by humans are paraphrasing (39.80%) and drop of information (26.76%), which are some of the most difficult operations to generalise from data. They are also the most difficult operations to identify automatically, with the lowest overall classifier accuracy among all operations (73% and 59%, respectively).

Cite

CITATION STYLE

APA

Amancio, M. A., & Specia, L. (2014). An analysis of crowdsourced text simplifications. In Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations, PITR 2014 at the 14th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2014 (pp. 123–130). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/w14-1214

An analysis of crowdsourced text simplifications

Abstract

Cite

Register to see more suggestions