The role of feature selection in text mining in the process of discovering missing clinical annotations – Case study

Aleksander Płaczek; Alicja Płuciennik; Mirosław Pach; Michał Jarząb; Dariusz Mrozek

Conference Proceedings

The role of feature selection in text mining in the process of discovering missing clinical annotations – Case study

Communications in Computer and Information Science (2019) 1018 248-262

DOI: 10.1007/978-3-030-19093-4_19

1Citations

4Readers

Get full text

Abstract

Vocabulary used by the doctors to describe the results of medical procedures changes alongside with the new standards. Text data, which is immediately understandable by the medical professional, is difficult to use in mass scale analysis. Extraction of data relevant to the given case, e.g. Bethesda class, means taking on the challenge of normalizing the freeform text and all the grammatical forms associated with it. This is particularly difficult in the Polish language where words change their form significantly according to their function in the sentence. We found common black-box methods for text mining inaccurate for this purpose. Here we described a word-frequency-based method for annotation of text data for Bethesda class extraction. We compared them with an algorithm based on a decision tree C4.5. We showed how important is the choice of the method and range of features to avoid conflicting classification. Proposed algorithms allowed to avoid the rule-base limitations.

Author supplied keywords

Cite

CITATION STYLE

APA

Płaczek, A., Płuciennik, A., Pach, M., Jarząb, M., & Mrozek, D. (2019). The role of feature selection in text mining in the process of discovering missing clinical annotations – Case study. In Communications in Computer and Information Science (Vol. 1018, pp. 248–262). Springer Verlag. https://doi.org/10.1007/978-3-030-19093-4_19

The role of feature selection in text mining in the process of discovering missing clinical annotations – Case study

Abstract

Author supplied keywords

Cite

Register to see more suggestions