This paper studies the structure of vectors obtained by using term selection methods in high-dimensional text collection. We found that the distance to transition point (DTP) method omits commonly occurring terms, which are poor discriminators between documents, but which convey important information about a collection. Experimental results obtained on the Reuters-21578 collection with the k-NN classifier show that feature selection by DTP combined with common terms outperforms slightly simple document frequency. © Springer-Verlag Berlin Heidelberg 2005.
CITATION STYLE
Moyotl-Hernández, E., & Jiménez-Salazar, H. (2005). Enhancement of DTP feature selection method for text categorization. In Lecture Notes in Computer Science (Vol. 3406, pp. 719–722). Springer Verlag. https://doi.org/10.1007/978-3-540-30586-6_80
Mendeley helps you to discover research relevant for your work.