Comparative Study of Feature Selection Methods for Medical Full Text Classification

Carlos Adriano Gonçalves; Eva Lorenzo Iglesias; Lourdes Borrajo; Rui Camacho; Adrián Seara Vieira; Célia Talma Gonçalves

Conference Proceedings

Comparative Study of Feature Selection Methods for Medical Full Text Classification

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2019) 11466 LNBI 550-560

DOI: 10.1007/978-3-030-17935-9_49

2Citations

11Readers

Get full text

Abstract

There is a lot of work in text categorization using only the title and abstract of the papers. However, in a full paper there is a much larger amount of information that could be used to improve the text classification performance. The potential benefits of using full texts come with an additional problem: the increased size of the data sets. To overcome the increased the size of full text data sets we performed an assessment study on the use of feature selection methods for full text classification. We have compared two existing feature selection methods (Information Gain and Correlation) and a novel method called k-Best-Discriminative-Terms. The assessment was conducted using the Ohsumed corpora. We have made two sets of experiments: using title and abstract only; and full text. The results achieved by the novel method show that the novel method does not perform well in small amounts of text like title and abstract but performs much better for the full text data sets and requires a much smaller number of attributes.

Author supplied keywords

Cite

CITATION STYLE

APA

Adriano Gonçalves, C., Lorenzo Iglesias, E., Borrajo, L., Camacho, R., Seara Vieira, A., & Talma Gonçalves, C. (2019). Comparative Study of Feature Selection Methods for Medical Full Text Classification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11466 LNBI, pp. 550–560). Springer Verlag. https://doi.org/10.1007/978-3-030-17935-9_49

Comparative Study of Feature Selection Methods for Medical Full Text Classification

Abstract

Author supplied keywords

Cite

Register to see more suggestions