Interdependence of text mining quality and the input data preprocessing

3Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The paper focuses on preprocessing techniques application to short informal textual documents created in different natural languages. The goal is to evaluate the impact on the quality of the results and computational complexity of the text mining process designed to reveal knowledge hidden in the data. Extensive number of experiments were carried out with real world text data with correction of spelling errors, stemming, stop words removal, and their combinations applied. Support vector machine, decision trees, and k-means algorithms as the commonly used methods were considered to analyze the text data. The text mining quality was generally not influenced significantly, however, the positive impact represented by the decreased computational complexity was observed.

Cite

CITATION STYLE

APA

Dařena, F., & Žižka, J. (2015). Interdependence of text mining quality and the input data preprocessing. In Advances in Intelligent Systems and Computing (Vol. 347, pp. 141–150). Springer Verlag. https://doi.org/10.1007/978-3-319-18476-0_15

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free