Social determinants of health extraction from clinical notes across institutions using large language models

4Citations
Citations of this article
44Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Detailed social determinants of health (SDoH) is often buried within clinical text in EHRs. Most current NLP efforts for SDoH have limitations, investigating limited factors, deriving data from a single institution, using specific patient cohorts/note types, with reduced focus on generalizability. We aim to address these issues by creating cross-institutional corpora and developing and evaluating the generalizability of classification models, including large language models (LLMs), for detecting SDoH factors using data from four institutions. Clinical notes were annotated with 21 SDoH factors at two levels: level 1 (SDoH factors only) and level 2 (SDoH factors and associated values). Compared to other models, instruction tuned LLM achieved top performance with micro-averaged F1 over 0.9 on level 1 corpora and over 0.84 on level 2 corpora. While models performed well when trained and tested on individual datasets, cross-dataset generalization highlighted remaining obstacles. Access to trained models will be made available at https://github.com/BIDS-Xu-Lab/LLMs4SDoH.

Cite

CITATION STYLE

APA

Keloth, V. K., Selek, S., Chen, Q., Gilman, C., Fu, S., Dang, Y., … Xu, H. (2025). Social determinants of health extraction from clinical notes across institutions using large language models. Npj Digital Medicine, 8(1). https://doi.org/10.1038/s41746-025-01645-8

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free