An investigation has been conducted on two well known similarity-based learning approaches to text categorization. This includes the k-nearest neighbor (k-NN) classifier and the Rocchio classifier. After identifying the weakness and strength of each technique, we propose a new classifier called the kNN model-based classifier by unifying the strengths of k-NN and Rocchio classifier and adapting to characteristics of text categorization problems. A text categorization prototypes system has been implemented and then evaluated on two common document corpora, namely, the 20-newsgroup collection and the ModApte version of the Reuters-21578 collection of news stories. The experimental results show that the kNN model-based approach outperforms the k-NN, Rocchio classifier. © Springer-Verlag 2004.
CITATION STYLE
Guo, G., Wang, H., Bell, D., Bi, Y., & Greer, K. (2004). An kNN model-based approach and its application in text categorization. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2945, 559–570. https://doi.org/10.1007/978-3-540-24630-5_69
Mendeley helps you to discover research relevant for your work.