Text categorization presents unique challenges due to the large number of attributes present in the data set, large number of training samples, attribute dependency, and multi-modality of categories. Existing classification techniques have limited applicability in the data sets of these natures. In this paper, we present a Weight Ad- justed k-Nearest Neighbor (WAKNN) classification that learns feature weights based on a greedy hill climbing technique. We also present two performance optimizations of WAKNN that improve the computational performance by a few orders of magnitude, but do not compromise on the classification quality. We experimentally evaluated WAKNN on 52 document data sets from a variety of domains and compared its performance against several classification algorithms, such as C4.5, RIPPER, Naive-Bayesian, PEBLS and VSM. Experimental results on these data sets confirm that WAKNN consistently outperforms other existing classification algorithms.
CITATION STYLE
Han, E. H. S., Karypis, G., & Kumar, V. (2001). Text categorization using weight adjusted k-nearest neighbor classification. In Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (Vol. 2035, pp. 53–65). Springer Verlag. https://doi.org/10.1007/3-540-45357-1_9
Mendeley helps you to discover research relevant for your work.