An empirical comparison of text categorization methods

Ana Cardoso-Cachopo; Arlindo L. Oliveira

Journal Article

An empirical comparison of text categorization methods

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2003) 2857 183-196

DOI: 10.1007/978-3-540-39984-1_14

37Citations

59Readers

Get full text

Abstract

In this paper we present a comprehensive comparison of the performance of a number of text categorization methods in two different data sets. In particular, we evaluate the Vector and Latent Semantic Analysis (LSA) methods, a classifier based on Support Vector Machines (SVM) and the k-Nearest Neighbor variations of the Vector and LSA models. We report the results obtained using the Mean Reciprocal Rank as a measure of overall performance, a commonly used evaluation measure for question answering tasks. We argue that this evaluation measure is also very well suited for text categorization tasks. Our results show that overall, SVMs and k-NN LSA perform better than the other methods, in a statistically significant way. © Springer-Verlag Berlin Heidelberg 2003.

Cite

CITATION STYLE

APA

Cardoso-Cachopo, A., & Oliveira, A. L. (2003). An empirical comparison of text categorization methods. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2857, 183–196. https://doi.org/10.1007/978-3-540-39984-1_14

An empirical comparison of text categorization methods

Abstract

Cite

Register to see more suggestions