Selection of relevant features for text classification with K-NN

Jerzy Balicki; Henryk Krawczyk; Łukasz Rymko; Julian Szymański

Conference Proceedings

Selection of relevant features for text classification with K-NN

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2013) 7895 LNAI(PART 2) 477-488

DOI: 10.1007/978-3-642-38610-7_44

3Citations

6Readers

Get full text

Abstract

In this paper, we describe five features selection techniques used for a text classification. An information gain, independent significance feature test, chi-squared test, odds ratio test, and frequency filtering have been compared according to the text benchmarks based on Wikipedia. For each method we present the results of classification quality obtained on the test datasets using K-NN based approach. A main advantage of evaluated approach is reducing the dimensionality of the vector space that allows to improve effectiveness of classification task. The information gain method, that obtained the best results, has been used for evaluation of features selection and classification scalability. We also provide the results indicating the feature selection is also useful for obtaining the common-sense features for describing natural-made categories. © 2013 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Balicki, J., Krawczyk, H., Rymko, Ł., & Szymański, J. (2013). Selection of relevant features for text classification with K-NN. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7895 LNAI, pp. 477–488). https://doi.org/10.1007/978-3-642-38610-7_44

Selection of relevant features for text classification with K-NN

Abstract

Author supplied keywords

Cite

Register to see more suggestions