Similarity search for the content of medical records using unstructured data

N/ACitations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Clustering large amounts of unstructured data is an important challenge in contemporary medicine and biology. This article presents an automatic clustering method for unstructured medical data. The presented method consists of the following main steps: transformation of the document corpus to a frequency matrix of terms; dimensionality reduction of the frequency matrix of terms using principal component analysis (PCA); the direct comparison of pairs of documents similarity measures using the cosine and correlation distances; and finding the optimal number of groups for expertly labelled data sets by treating the clustering problem as an optimization problem in which the objective function is an F measure to be optimized via the selection of parameter values such as PCA resolution and the similarity threshold of the pairs of documents. The usefulness of the proposed methodology was demonstrated by performing calculations on three data sets: short sentences divided into three themes, radiological reports of aneurysms, and radiological reports of abdomen studies. A common barrier in clustering unstructured data is difficulty in results interpretation. To overcome this limitation, the utility of presentation methods, including group histograms, similarity matrices, plots of document assignment to founding clusters, F-measure interpolation and alphabetical- and term-frequency dictionaries, are presented. Excluding the labelling step, the presented method is completely automated and can be used as a preliminary data analysis method for large bodies of text to discover potential groups of interesting topics.

Cite

CITATION STYLE

APA

Wilczek, S., Gawrysiak, K., & Spinczyk, D. (2019). Similarity search for the content of medical records using unstructured data. In Advances in Intelligent Systems and Computing (Vol. 762, pp. 506–517). Springer Verlag. https://doi.org/10.1007/978-3-319-91211-0_44

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free