Persian text classification based on K-NN using wordnet

Mostafa Parchami; Bahareh Akhtar; Mirhossein Dezfoulian

Conference Proceedings

Persian text classification based on K-NN using wordnet

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2012) 7345 LNAI 283-291

DOI: 10.1007/978-3-642-31087-4_30

6Citations

9Readers

Get full text

Abstract

K-NN is widely used for text classification purpose. Basic K-NN has poor accuracy; other methods should be applied to basic K-NN to improve accuracy and efficiency. In this paper we propose a method that uses wordnet to increase similarity of documents under the same category. Documents are represented by single words and their frequencies, by using wordnet, frequency of related words is changed to acquire higher accuracy. Information gained is used to eliminate terms that are not discriminated. Words like "and", "or" and "that" in English are not important in text classification and the best way to eliminate them is to calculate their information gain. PCA is used to reduce number of features and increase speed of the method. Applying this method, we designed a faster and much accurate classifier for Persian language. Experiments show that applying this preprocessing will increase accuracy and speed of K-NN. Accuracy of the proposed K-NN classifier on Hamshahri corpus is 88.18%. © 2012 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Parchami, M., Akhtar, B., & Dezfoulian, M. (2012). Persian text classification based on K-NN using wordnet. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7345 LNAI, pp. 283–291). https://doi.org/10.1007/978-3-642-31087-4_30

Persian text classification based on K-NN using wordnet

Abstract

Author supplied keywords

Cite

Register to see more suggestions