The paper focuses on preprocessing techniques application to short informal textual documents created in different natural languages. The goal is to evaluate the impact on the quality of the results and computational complexity of the text mining process designed to reveal knowledge hidden in the data. Extensive number of experiments were carried out with real world text data with correction of spelling errors, stemming, stop words removal, and their combinations applied. Support vector machine, decision trees, and k-means algorithms as the commonly used methods were considered to analyze the text data. The text mining quality was generally not influenced significantly, however, the positive impact represented by the decreased computational complexity was observed.
CITATION STYLE
Dařena, F., & Žižka, J. (2015). Interdependence of text mining quality and the input data preprocessing. In Advances in Intelligent Systems and Computing (Vol. 347, pp. 141–150). Springer Verlag. https://doi.org/10.1007/978-3-319-18476-0_15
Mendeley helps you to discover research relevant for your work.