Validating a strategy for psychosocial phenotyping using a large corpus of clinical text

Adi V. Gundlapalli; Andrew Redd; Marjorie Carter; Guy Divita; Shuying Shen; Miland Palmer; Matthew H. Samore

Journal ArticleOPEN ACCESS

Validating a strategy for psychosocial phenotyping using a large corpus of clinical text

Journal of the American Medical Informatics Association (2013) 20(E2)

DOI: 10.1136/amiajnl-2013-001946

23Citations

79Readers

Abstract

Objective: To develop algorithms to improve efficiency of patient phenotyping using natural language processing (NLP) on text data. Of a large number of note titles available in our database, we sought to determine those with highest yield and precision for psychosocial concepts. Materials and methods: From a database of over 1 billion documents from US Department of Veterans Affairs medical facilities, a random sample of 1500 documents from each of 218 enterprise note titles were chosen. Psychosocial concepts were extracted using a UIMA-AS-based NLP pipeline (v3NLP), using a lexicon of relevant concepts with negation and template format annotators. Human reviewers evaluated a subset of documents for false positives and sensitivity. High-yield documents were identified by hit rate and precision. Reasons for false positivity were characterized. Results: A total of 58 707 psychosocial concepts were identified from 316 355 documents for an overall hit rate of 0.2 concepts per document (median 0.1, range 1.6-0). Of 6031 concepts reviewed from a high-yield set of note titles, the overall precision for all concept categories was 80%, with variability among note titles and concept categories. Reasons for false positivity included templating, negation, context, and alternate meaning of words. The sensitivity of the NLP system was noted to be 49% (95% CI 43% to 55%). Conclusions: Phenotyping using NLP need not involve the entire document corpus. Our methods offer a generalizable strategy for scaling NLP pipelines to large free text corpora with complex linguistic annotations in attempts to identify patients of a certain phenotype.

References Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Gundlapalli, A. V., Redd, A., Carter, M., Divita, G., Shen, S., Palmer, M., & Samore, M. H. (2013). Validating a strategy for psychosocial phenotyping using a large corpus of clinical text. Journal of the American Medical Informatics Association, 20(E2). https://doi.org/10.1136/amiajnl-2013-001946

Readers' Seniority

PhD / Post grad / Masters / Doc 24

49%

Researcher 18

37%

Professor / Associate Prof. 5

10%

Lecturer / Post doc 2

Readers' Discipline

Medicine and Dentistry 23

51%

Computer Science 15

33%

Psychology 4

Biochemistry, Genetics and Molecular Bi... 3

Validating a strategy for psychosocial phenotyping using a large corpus of clinical text

Abstract

References Powered by Scopus

Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): Architecture, component evaluation and applications

Natural language processing: An introduction

A simple algorithm for identifying negated findings and diseases in discharge summaries

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline