Term-BLAST-like alignment tool for concept recognition in noisy clinical texts

2Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.

Abstract

Motivation: Methods for concept recognition (CR) in clinical texts have largely been tested on abstracts or articles from the medical literature. However, texts from electronic health records (EHRs) frequently contain spelling errors, abbreviations, and other nonstandard ways of representing clinical concepts. Results: Here, we present a method inspired by the BLAST algorithm for biosequence alignment that screens texts for potential matches on the basis of matching k-mer counts and scores candidates based on conformance to typical patterns of spelling errors derived from 2.9 million clinical notes. Our method, the Term-BLAST-like alignment tool (TBLAT) leverages a gold standard corpus for typographical errors to implement a sequence alignment-inspired method for efficient entity linkage. We present a comprehensive experimental comparison of TBLAT with five widely used tools. Experimental results show an increase of 10% in recall on scientific publications and 20% increase in recall on EHR records (when compared against the next best method), hence supporting a significant enhancement of the entity linking task. The method can be used stand-Alone or as a complement to existing approaches.

Cite

CITATION STYLE

APA

Groza, T., Wu, H., Dinger, M. E., Danis, D., Hilton, C., Bagley, A., … Robinson, P. N. (2023). Term-BLAST-like alignment tool for concept recognition in noisy clinical texts. Bioinformatics, 39(12). https://doi.org/10.1093/bioinformatics/btad716

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free