Vocabulary used by the doctors to describe the results of medical procedures changes alongside with the new standards. Text data, which is immediately understandable by the medical professional, is difficult to use in mass scale analysis. Extraction of data relevant to the given case, e.g. Bethesda class, means taking on the challenge of normalizing the freeform text and all the grammatical forms associated with it. This is particularly difficult in the Polish language where words change their form significantly according to their function in the sentence. We found common black-box methods for text mining inaccurate for this purpose. Here we described a word-frequency-based method for annotation of text data for Bethesda class extraction. We compared them with an algorithm based on a decision tree C4.5. We showed how important is the choice of the method and range of features to avoid conflicting classification. Proposed algorithms allowed to avoid the rule-base limitations.
CITATION STYLE
Płaczek, A., Płuciennik, A., Pach, M., Jarząb, M., & Mrozek, D. (2019). The role of feature selection in text mining in the process of discovering missing clinical annotations – Case study. In Communications in Computer and Information Science (Vol. 1018, pp. 248–262). Springer Verlag. https://doi.org/10.1007/978-3-030-19093-4_19
Mendeley helps you to discover research relevant for your work.