Automatic text classification for label imputation of medical diagnosis notes based on random forest

8Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Electronic medical records (EMRs) contain many information of patients, which are of great value for data mining for various clinical applications. However, information missing, including label missing, is pervasive in nature EMRs which would bring lots of obstacles for processing of the medical text contents. The aim of this study is to adopt automatic text classification technologies to recover missing medical text labels for EMRs and support downstream analyses. A combination of word-embedding technology and random forest classifiers are applied to identify multiple medical note labels including disease types and examination types, from short texts of medical imaging diagnosis notes. The results show that the average binary classification accuracies are 91%. Our research results indicate that using advanced NLP techniques for EMRs can reach high classification accuracies.

Cite

CITATION STYLE

APA

Yang, B., Dai, G., Yang, Y., Tang, D., Li, Q., Lin, D., … Cai, Y. (2018). Automatic text classification for label imputation of medical diagnosis notes based on random forest. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11148 LNCS, pp. 87–97). Springer Verlag. https://doi.org/10.1007/978-3-030-01078-2_8

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free