The article describes a comparative study of text preprocessing techniques for natural language call routing. Seven different unsupervised and supervised term weighting methods were considered. Four different dimensionality reduction methods were applied: stop-words filtering with stemming, feature selection based on term weights, feature transformation based on term clustering, and a novel feature transformation method based on terms belonging to classes. As classification algorithms we used k-NN and the SVM-based algorithm Fast Large Margin. The numerical experiments showed that the most effective term weighting method is Term Relevance Ratio (TRR). Feature transformation based on term clustering is able to significantly decrease dimensionality without significantly changing the classification effectiveness, unlike other dimensionality reduction methods. The novel feature transformation method reduces the dimensionality radically: number of features is equal to number of classes.
CITATION STYLE
Sergienko, R., Shan, M., & Schmitt, A. (2017). A comparative study of text preprocessing techniques for natural language call routing. In Lecture Notes in Electrical Engineering (Vol. 427 427 LNEE, pp. 23–37). Springer Verlag. https://doi.org/10.1007/978-981-10-2585-3_2
Mendeley helps you to discover research relevant for your work.