Abstract
Clinical text classification of electronic medical records is a chal-lenging task. Existing electronic records suffer from irrelevant text, misspellings, semantic ambiguity, and abbreviations. The approach reported in this paper elab-orates on machine learning techniques to develop an intelligent framework for classification of the medical transcription dataset. The proposed approach is based on four main phases: the text preprocessing phase, word representation phase, features reduction phase and classification phase. We have used four machine learning algorithms, support vector machines, naïve bayes, logistic regression and k-nearest neighbors in combination with different word representation models. We have applied the four algorithms to the bag of words, to TF-IDF, to word2vec. Experimental results were evaluated based on precision, recall, accuracy and F1 score. The best results were obtained with the combination of the k-NN classifier, and the word represented by Word2vec achieving an accuracy of 92% to cor-rectly classify the medical specialties based on the transcription text.
Author supplied keywords
Cite
CITATION STYLE
Almazaydeh, L., Abuhelaleh, M., Tawil, A. A., & Elleithy, K. (2023). Clinical Text Classification with Word Representation Features and Machine Learning Algorithms. International Journal of Online and Biomedical Engineering, 19(4), 65–76. https://doi.org/10.3991/ijoe.v19i04.36099
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.