Interdependence of text mining quality and the input data preprocessing

František Dařena; Jan Žižka

Conference Proceedings

Interdependence of text mining quality and the input data preprocessing

Advances in Intelligent Systems and Computing (2015) 347 141-150

DOI: 10.1007/978-3-319-18476-0_15

3Citations

11Readers

Get full text

Abstract

The paper focuses on preprocessing techniques application to short informal textual documents created in different natural languages. The goal is to evaluate the impact on the quality of the results and computational complexity of the text mining process designed to reveal knowledge hidden in the data. Extensive number of experiments were carried out with real world text data with correction of spelling errors, stemming, stop words removal, and their combinations applied. Support vector machine, decision trees, and k-means algorithms as the commonly used methods were considered to analyze the text data. The text mining quality was generally not influenced significantly, however, the positive impact represented by the decreased computational complexity was observed.

Author supplied keywords

Cite

CITATION STYLE

APA

Dařena, F., & Žižka, J. (2015). Interdependence of text mining quality and the input data preprocessing. In Advances in Intelligent Systems and Computing (Vol. 347, pp. 141–150). Springer Verlag. https://doi.org/10.1007/978-3-319-18476-0_15

Interdependence of text mining quality and the input data preprocessing

Abstract

Author supplied keywords

Cite

Register to see more suggestions