Application of Machine Learning and Word Embeddings in the Classification of Cancer Diagnosis Using Patient Anamnesis

Andres Alejandro Ramos Magna; Hector Allende-Cid; Carla Taramasco; Carlos Becerra; Rosa L. Figueroa

Journal ArticleOPEN ACCESS

Application of Machine Learning and Word Embeddings in the Classification of Cancer Diagnosis Using Patient Anamnesis

IEEE Access (2020) 8 106198-106213

DOI: 10.1109/ACCESS.2020.3000075

39Citations

92Readers

Abstract

Currently, one of the main challenges for information systems in healthcare is focused on support for health professionals regarding disease classifications. This work presents an innovative method for a recommendation system for the diagnosis of breast cancer using patient medical histories. In this proposal, techniques of natural language processing (NLP) were implemented on real datasets: one comprised 160, 560 medical histories of anonymous patients from a hospital in Chile for the following categories: breast cancer, cysts and nodules, other cancer, breast cancer surgeries and other diagnoses; and the other dataset was obtained from the MIMIC III dataset. With the application of word-embedding techniques, such as word2vec's skip-gram and BERT, and machine learning techniques, a recommendation system as a tool to support the physician's decision-making was implemented. The obtained results demonstrate that using word embeddings can define a good-quality recommendation system. The results of 20 experiments with 5-fold cross-validation for anamnesis written in Spanish yielded an F1 of 0.980 ± 0.0014 on the classification of 'cancer' versus 'not cancer' and 0.986 ± 0.0014 for 'breast cancer' versus 'other cancer'. Similar results were obtained with the MIMIC III dataset.

Author supplied keywords

Cite

CITATION STYLE

APA

Magna, A. A. R., Allende-Cid, H., Taramasco, C., Becerra, C., & Figueroa, R. L. (2020). Application of Machine Learning and Word Embeddings in the Classification of Cancer Diagnosis Using Patient Anamnesis. IEEE Access, 8, 106198–106213. https://doi.org/10.1109/ACCESS.2020.3000075

Application of Machine Learning and Word Embeddings in the Classification of Cancer Diagnosis Using Patient Anamnesis

Abstract

Author supplied keywords

Cite

Register to see more suggestions