Automated text categorization based on readability fingerprints

1Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper introduces the use of 15 different readability indices as a fingerprint that enables the classification of documents into different categories. While a classification based on such fingerprints alone is not necessarily superior to document categorization based on dedicated dictionaries per se, the document fingerprints can enhance the overall classification rate by applying proper data fusion techniques. For other applications text mining related applications such as language classification, the detection of plagiarism, or author identification, the accuracy of text categorization methods based on readability fingerprints can even exceed a dictionary-based approach. A novel addition to the readability indices is the addition of histograms based on the word length of all the dictionary words used in the text and a dictionary of the most common easy words in the English language. © Springer-Verlag Berlin Heidelberg 2007.

Cite

CITATION STYLE

APA

Embrechts, M. J., Linton, J., Bogaerts, W. F., Heyns, B., & Evangelista, P. (2007). Automated text categorization based on readability fingerprints. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4669 LNCS, pp. 408–416). Springer Verlag. https://doi.org/10.1007/978-3-540-74695-9_42

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free