Comparison of natural language processing biosurveillance methods for identifying influenza from encounter notes

  • Elkin P
  • Froehling D
  • Wahner-Roedler D
 et al. 
  • 58


    Mendeley users who have this article in their library.
  • 35


    Citations of this article.


BACKGROUND: An effective national biosurveillance system expedites outbreak recognition and facilitates response coordination at the federal, state, and local levels. The BioSense system, used at the Centers for Disease Control and Prevention, incorporates chief complaints but not data from the whole encounter note into its surveillance algorithms. OBJECTIVE: To evaluate whether biosurveillance by using data from the whole encounter note is superior to that using data from the chief complaint field alone. DESIGN: 6-year retrospective case-control cohort study. SETTING: Mayo Clinic, Rochester, Minnesota. PARTICIPANTS: 17,243 persons tested for influenza A or B virus between 1 January 2000 and 31 December 2006. MEASUREMENTS: The accuracy of a model based on signs and symptoms to predict influenza virus infection in patients with upper respiratory tract symptoms, and the ability of a natural language processing technique to identify definitional clinical features from free-text encounter notes. RESULTS: Surveillance based on the whole encounter note was superior to the chief complaint field alone. For the case definition used by surveillance of the whole encounter note, the normalized partial area under the receiver-operating characteristic curve (specificity, 0.1 to 0.4) for surveillance using the whole encounter note was 92.9% versus 70.3% for surveillance with the chief complaint field (difference, 22.6%; P < 0.001). Comparison of the 2 models at the fixed specificity of 0.4 resulted in sensitivities of 89.0% and 74.4%, respectively (P < 0.001). The relative risk for missing a true case of influenza was 2.3 by using the chief complaint field model. LIMITATIONS: Participants were seen at 1 tertiary referral center. The cost of comprehensive biosurveillance monitoring was not studied. CONCLUSION: A biosurveillance model for influenza using the whole encounter note is more accurate than a model that uses only the chief complaint field. Because case-defining signs and symptoms of influenza are commonly available in health records, the investigators believe that the national strategy for biosurveillance should be changed to incorporate data from the whole health record. PRIMARY FUNDING SOURCE: Centers for Disease Control and Prevention.

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document


  • Peter L. Elkin

  • David A. Froehling

  • Dietlind L. Wahner-Roedler

  • Steven H. Brown

  • Kent R. Bailey

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free