Comparative Study of Feature Selection Methods for Medical Full Text Classification

2Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.
Get full text

Abstract

There is a lot of work in text categorization using only the title and abstract of the papers. However, in a full paper there is a much larger amount of information that could be used to improve the text classification performance. The potential benefits of using full texts come with an additional problem: the increased size of the data sets. To overcome the increased the size of full text data sets we performed an assessment study on the use of feature selection methods for full text classification. We have compared two existing feature selection methods (Information Gain and Correlation) and a novel method called k-Best-Discriminative-Terms. The assessment was conducted using the Ohsumed corpora. We have made two sets of experiments: using title and abstract only; and full text. The results achieved by the novel method show that the novel method does not perform well in small amounts of text like title and abstract but performs much better for the full text data sets and requires a much smaller number of attributes.

Cite

CITATION STYLE

APA

Adriano Gonçalves, C., Lorenzo Iglesias, E., Borrajo, L., Camacho, R., Seara Vieira, A., & Talma Gonçalves, C. (2019). Comparative Study of Feature Selection Methods for Medical Full Text Classification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11466 LNBI, pp. 550–560). Springer Verlag. https://doi.org/10.1007/978-3-030-17935-9_49

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free