Objective: Automatically identifying patients at risk of immune checkpoint inhibitor (ICI)-induced colitis allows physicians to improve patientcare. However, predictive models require training data curated from electronic health records (EHR). Our objective is to automatically identify notes documenting ICI-colitis cases to accelerate data curation. Materials and Methods: We present a data pipeline to automatically identify ICI-colitis from EHR notes, accelerating chart review. The pipeline relies on BERT, a state-of-the-art natural language processing (NLP) model. The first stage of the pipeline segments long notes using keywords identified through a logistic classifier and applies BERT to identify ICI-colitis notes. The next stage uses a second BERT model tuned to identify false positive notes and remove notes that were likely positive for mentioning colitis as a side-effect. The final stage further accelerates curation by highlighting the colitis-relevant portions of notes. Specifically, we use BERT's attention scores to find high-density regions describing colitis. Results: The overall pipeline identified colitis notes with 84% precision and reduced the curator note review load by 75%. The segment BERT classifier had a high recall of 0.98, which is crucial to identify the low incidence (<10%) of colitis. Discussion: Curation from EHR notes is a burdensome task, especially when the curation topic is complicated. Methods described in this work are not only useful for ICI colitis but can also be adapted for other domains. Conclusion: Our extraction pipeline reduces manual note review load and makes EHR data more accessible for research.
CITATION STYLE
Rahman, P., Ye, C., Mittendorf, K. F., Lenoue-Newton, M., Micheel, C., Wolber, J., … Fabbri, D. (2023). Accelerated curation of checkpoint inhibitor-induced colitis cases from electronic health records. JAMIA Open, 6(1). https://doi.org/10.1093/jamiaopen/ooad017
Mendeley helps you to discover research relevant for your work.