Background: Approximately 20% of deaths in the US each year are attributable to smoking, yet current practices in the recording of this health risk in electronic health records (EHRs) have not led to discernable changes in health outcomes. Several groups have developed algorithms for extracting smoking behaviors from clinical notes, but none of these approaches were assessed with external data to report on anticipated clinical performance. Methods: Previously, we developed an informatics pipeline that extracts smoking status, pack year history, and cessation date from clinical notes. Here we report on the clinical implementation performance of our pipeline using 1,504 clinical notes matched to an external questionnaire. Results: We found that 73% of available notes contained no smoking behavior information. The weighted Cohen's kappa between the external questionnaire and EHR smoking status was 0.62 (95% CI 0.56-0.69) for the clinical notes we were able to extract information from. The correlation between pack years reported by our pipeline and the external questionnaire was 0.39 on the 81 notes for which this information was present in both. We also assessed for lung cancer screening eligibility using notes from individuals identified as never smokers or smokers with pack year history extracted by our pipeline (n = 196). We found a positive predictive value of 85.4%, a negative predictive value of 83.8%, sensitivity of 63.1%, and specificity of 94.7%. Conclusions: We have demonstrated that our pipeline can extract smoking behaviors from unannotated EHR notes when the information is present. This information is reliable enough to identify patients most likely to be eligible for smoking related services. Ensuring capture of smoking information during clinical encounters should continue to be a high priority.
Palmer, E. L., Higgins, J., Hassanpour, S., Sargent, J., Robinson, C. M., Doherty, J. A., & Onega, T. (2019). Assessing data availability and quality within an electronic health record system through external validation against an external clinical data source. BMC Medical Informatics and Decision Making, 19(1). https://doi.org/10.1186/s12911-019-0864-2