Classifying easy-to-read texts without parsing

7Citations
Citations of this article
71Readers
Mendeley users who have this article in their library.

Abstract

Document classification using automated linguistic analysis and machine learning (ML) has been shown to be a viable road forward for readability assessment. The best models can be trained to decide if a text is easy to read or not with very high accuracy, e.g. a model using 117 parameters from shallow, lexical, morphological and syntactic analyses achieves 98,9% accuracy. In this paper we compare models created by parameter optimization over subsets of that total model to find out to which extent different high-performing models tend to consist of the same parameters and if it is possible to find models that only use features not requiring parsing. We used a genetic algorithm to systematically optimize parameter sets of fixed sizes using accuracy of a Support Vector Machine classifier as fitness function. Our results show that it is possible to find models almost as good as the currently best models while omitting parsing based features.

Cite

CITATION STYLE

APA

Falkenjack, J., & Jönsson, A. (2014). Classifying easy-to-read texts without parsing. In Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations, PITR 2014 at the 14th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2014 (pp. 114–122). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/w14-1213

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free