Interpretable segmentation of medical free-text records based on word embeddings

Adam Gabriel Dobrakowski; Agnieszka Mykowiecka; Małgorzata Marciniak; Wojciech Jaworski; Przemysław Biecek

Journal ArticleOPEN ACCESS

Interpretable segmentation of medical free-text records based on word embeddings

Journal of Intelligent Information Systems (2021) 57(3) 447-465

DOI: 10.1007/s10844-021-00659-4

12Citations

27Readers

Abstract

Medical free-text records store a lot of useful information that can be exploited in developing computer-supported medicine. However, extracting the knowledge from the unstructured text is difficult and depends on the language. In the paper, we apply Natural Language Processing methods to process raw medical texts in Polish and propose a new methodology for clustering of patients’ visits. We (1) extract medical terminology from a corpus of free-text clinical records, (2) annotate data with medical concepts, (3) compute vector representations of medical concepts and validate them on the proposed term analogy tasks, (4) compute visit representations as vectors, (5) introduce a new method for clustering of patients’ visits and (6) apply the method to a corpus of 100,000 visits. We use several approaches to visual exploration that facilitate interpretation of segments. With our method, we obtain stable and separated segments of visits which are positively validated against final medical diagnoses. In this paper we show how algorithm for segmentation of medical free-text records may be used to aid medical doctors. In addition to this, we share implementation of described methods with examples as open-source R package memr.

Author supplied keywords

Cite

CITATION STYLE

APA

Dobrakowski, A. G., Mykowiecka, A., Marciniak, M., Jaworski, W., & Biecek, P. (2021). Interpretable segmentation of medical free-text records based on word embeddings. Journal of Intelligent Information Systems, 57(3), 447–465. https://doi.org/10.1007/s10844-021-00659-4

Interpretable segmentation of medical free-text records based on word embeddings

Abstract

Author supplied keywords

Cite

Register to see more suggestions