MEDLINE abstracts classification based on noun phrases extraction

Fernando Ruiz-Rico; José Luis Vicedo; María Consuelo Rubio-Sánchez

Conference Proceedings

MEDLINE abstracts classification based on noun phrases extraction

Communications in Computer and Information Science (2008) 25 CCIS 507-519

DOI: 10.1007/978-3-540-92219-3_38

4Citations

5Readers

Get full text

Abstract

Many algorithms have come up in the last years to tackle automated text categorization. They have been exhaustively studied, leading to several variants and combinations not only in the particular procedures but also in the treatment of the input data. A widely used approach is representing documents as Bag-Of-Words (BOW) and weighting tokens with the TFIDF schema. Many researchers have thrown into precision and recall improvements and classification time reduction enriching BOW with stemming, n-grams, feature selection, noun phrases, metadata, weight normalization, etc. We contribute to this field with a novel combination of these techniques. For evaluation purposes, we provide comparisons to previous works with SVM against the simple BOW. The well known OHSUMED corpus is exploited and different sets of categories are selected, as previously done in the literature. The conclusion is that the proposed method can be successfully applied to existing binary classifiers such as SVM outperforming the mixture of BOW and TFIDF approaches. © 2008 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Ruiz-Rico, F., Vicedo, J. L., & Rubio-Sánchez, M. C. (2008). MEDLINE abstracts classification based on noun phrases extraction. In Communications in Computer and Information Science (Vol. 25 CCIS, pp. 507–519). https://doi.org/10.1007/978-3-540-92219-3_38

MEDLINE abstracts classification based on noun phrases extraction

Abstract

Author supplied keywords

Cite

Register to see more suggestions