BACKGROUND: Extra-Intestinal Manifestations (EIM) occur in nearly 40% of patients with IBD and impact both disease experience and therapeutic decision-making, but are not well captured by administrative codes. We aimed to pilot computational natural language processing (NLP) methods to characterize EIMs using office notes. METHODS: Subjects with a diagnosis of IBD were identified in a single-center retrospective review of electronic health records (EHR) between 2014-2017. Gastroenterology (GI) office notes were collected and annotated by two reviewers for the presence and activity of EIMs. EIM concepts in electronic text documents were identified using NLP methods leveraging UMLS libraries and hand-crafted features. EIM characterization occurred within a +/-25-word window around identified EIMs with classifications including inactive concepts (negated, historical, resolved) and active concepts (improved, worsened, active but unchanged). Decisions on EIM status when repeatedly referenced in a document used section-based weighting for status inference, with greatest to least weight ranking for assessment/plan, subjective, past history, exam, and other, respectively. EIM status was classified as ambiguous when multiple conflicting references were present within the same document of approximately equal weight. Model development and testing used an 80/20 dataset split. RESULTS: In 4,108 unique IBD patients, 1,640 (39.9%) had at least 1 EIM identified. The mean age was 41.9 years, 47.2% were male, and 27.0% had biologic exposure. A total of 1,240 documents (first GI notes) were manually annotated for EIMs, comprised of 51.1% arthritis, 16.5% ocular, 16.2% psoriasis, with low frequency EIMs of erythema nodosum, pyoderma gangrenosum, and hidradenitis suppurativa together comprising 16.2% of the cohort. NLP models performed well for correctly classifying both EIM presence and status, with an overall accuracy of 91.2%, a specificity of 92.9% and a sensitivity of 81.8% across all EIMs in notes automatically classified as non-ambiguous (Table 1). NLP methods identified EIM status classification as ambiguous in 38.9% of cases, dominated by the numerically most common EIM type of arthritis. On qualitative note review, 55% of EIMs classified as ambiguous by NLP methods were determined to have an unclear status after gastroenterologist review. CONCLUSIONS: NLP methods can detect and classify EIMs with reasonable performance and efficiency compared to traditional manual chart review. Though source document variation and ambiguity present challenges, estimating granular individual-level disease features at scale using NLP offers exciting possibilities for population-based research and decision support.
CITATION STYLE
Stidham, R., Yu, D., Lahiri, S., & Vydiswaran, V. (2020). P311 Detection and characterisation of extra-intestinal manifestations of IBD in clinical office notes using natural language processing. Journal of Crohn’s and Colitis, 14(Supplement_1), S309–S310. https://doi.org/10.1093/ecco-jcc/jjz203.440
Mendeley helps you to discover research relevant for your work.