Text categorization by a machine-learning-based term selection

Javier Fernández; Elena Montañés; Irene Díaz; José Ranilla; Elías F. Combarro

Journal Article

Text categorization by a machine-learning-based term selection

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2004) 3180 253-262

DOI: 10.1007/978-3-540-30075-5_25

4Citations

9Readers

Get full text

Abstract

Term selection is one of the main tasks in Information Retrieval and Text Categorization. It has been traditionally carried out by statistical methods based on the frequency of appearance of the words in the documents. In this paper it is presented a method for extracting relevant words of a document by taking into account their linguistic information. These relevant words are obtained by a Machine Learning algorithm which takes manually selected words as training set. With the lexica obtained by this technique Text Categorization is performed by using Support Vector Machines. The results are compared with one of the most used method for term selection (based just on statistical information) and it is found the new method performs better and has the additional advantage of automatically selecting the filtering level. © Springer-Verlag Berlin Heidelberg 2004.

Cite

CITATION STYLE

APA

Fernández, J., Montañés, E., Díaz, I., Ranilla, J., & Combarro, E. F. (2004). Text categorization by a machine-learning-based term selection. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3180, 253–262. https://doi.org/10.1007/978-3-540-30075-5_25

Text categorization by a machine-learning-based term selection

Abstract

Cite

Register to see more suggestions