Medical free-text records store a lot of useful information that can be exploited in developing computer-supported medicine. However, extracting the knowledge from the unstructured text is difficult and depends on the language. In the paper, we apply Natural Language Processing methods to process raw medical texts in Polish and propose a new methodology for clustering of patients’ visits. We (1) extract medical terminology from a corpus of free-text clinical records, (2) annotate data with medical concepts, (3) compute vector representations of medical concepts and validate them on the proposed term analogy tasks, (4) compute visit representations as vectors, (5) introduce a new method for clustering of patients’ visits and (6) apply the method to a corpus of 100,000 visits. We use several approaches to visual exploration that facilitate interpretation of segments. With our method, we obtain stable and separated segments of visits which are positively validated against final medical diagnoses. In this paper we show how algorithm for segmentation of medical free-text records may be used to aid medical doctors. In addition to this, we share implementation of described methods with examples as open-source R package memr.
CITATION STYLE
Dobrakowski, A. G., Mykowiecka, A., Marciniak, M., Jaworski, W., & Biecek, P. (2021). Interpretable segmentation of medical free-text records based on word embeddings. Journal of Intelligent Information Systems, 57(3), 447–465. https://doi.org/10.1007/s10844-021-00659-4
Mendeley helps you to discover research relevant for your work.