Validating a strategy for psychosocial phenotyping using a large corpus of clinical text

23Citations
Citations of this article
79Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Objective: To develop algorithms to improve efficiency of patient phenotyping using natural language processing (NLP) on text data. Of a large number of note titles available in our database, we sought to determine those with highest yield and precision for psychosocial concepts. Materials and methods: From a database of over 1 billion documents from US Department of Veterans Affairs medical facilities, a random sample of 1500 documents from each of 218 enterprise note titles were chosen. Psychosocial concepts were extracted using a UIMA-AS-based NLP pipeline (v3NLP), using a lexicon of relevant concepts with negation and template format annotators. Human reviewers evaluated a subset of documents for false positives and sensitivity. High-yield documents were identified by hit rate and precision. Reasons for false positivity were characterized. Results: A total of 58 707 psychosocial concepts were identified from 316 355 documents for an overall hit rate of 0.2 concepts per document (median 0.1, range 1.6-0). Of 6031 concepts reviewed from a high-yield set of note titles, the overall precision for all concept categories was 80%, with variability among note titles and concept categories. Reasons for false positivity included templating, negation, context, and alternate meaning of words. The sensitivity of the NLP system was noted to be 49% (95% CI 43% to 55%). Conclusions: Phenotyping using NLP need not involve the entire document corpus. Our methods offer a generalizable strategy for scaling NLP pipelines to large free text corpora with complex linguistic annotations in attempts to identify patients of a certain phenotype.

References Powered by Scopus

This article is free to access.

Natural language processing: An introduction

974Citations
2445Readers

This article is free to access.

This article is free to access.

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Gundlapalli, A. V., Redd, A., Carter, M., Divita, G., Shen, S., Palmer, M., & Samore, M. H. (2013). Validating a strategy for psychosocial phenotyping using a large corpus of clinical text. Journal of the American Medical Informatics Association, 20(E2). https://doi.org/10.1136/amiajnl-2013-001946

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 24

49%

Researcher 18

37%

Professor / Associate Prof. 5

10%

Lecturer / Post doc 2

4%

Readers' Discipline

Tooltip

Medicine and Dentistry 23

51%

Computer Science 15

33%

Psychology 4

9%

Biochemistry, Genetics and Molecular Bi... 3

7%

Save time finding and organizing research with Mendeley

Sign up for free