Identifying COVID-19 Outbreaks From Contact-Tracing Interview Forms for Public Health Departments: Development of a Natural Language Processing Pipeline

2Citations
Citations of this article
16Readers
Mendeley users who have this article in their library.

Abstract

Background: In Wisconsin, COVID-19 case interview forms contain free-text fields that need to be mined to identify potential outbreaks for targeted policy making. We developed an automated pipeline to ingest the free text into a pretrained neural language model to identify businesses and facilities as outbreaks. Objective: We aimed to examine the precision and recall of our natural language processing pipeline against existing outbreaks and potentially new clusters. Methods: Data on cases of COVID-19 were extracted from the Wisconsin Electronic Disease Surveillance System (WEDSS) for Dane County between July 1, 2020, and June 30, 2021. Features from the case interview forms were fed into a Bidirectional Encoder Representations from Transformers (BERT) model that was fine-tuned for named entity recognition (NER). We also developed a novel location-mapping tool to provide addresses for relevant NER. Precision and recall were measured against manually verified outbreaks and valid addresses in WEDSS. Results: There were 46,798 cases of COVID-19, with 4,183,273 total BERT tokens and 15,051 unique tokens. The recall and precision of the NER tool were 0.67 (95% CI 0.66-0.68) and 0.55 (95% CI 0.54-0.57), respectively. For the location-mapping tool, the recall and precision were 0.93 (95% CI 0.92-0.95) and 0.93 (95% CI 0.92-0.95), respectively. Across monthly intervals, the NER tool identified more potential clusters than were verified in WEDSS. Conclusions: We developed a novel pipeline of tools that identified existing outbreaks and novel clusters with associated addresses. Our pipeline ingests data from a statewide database and may be deployed to assist local health departments for targeted interventions.

References Powered by Scopus

Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction

439Citations
N/AReaders
Get full text

Practical Machine Learning for Data Analysis Using Python

136Citations
N/AReaders
Get full text

Role of mHealth applications for improving antenatal and postnatal care in low and middle income countries: a systematic review

133Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Using artificial intelligence to improve public health: a narrative review

72Citations
N/AReaders
Get full text

On the Adoption of Modern Technologies to Fight the COVID-19 Pandemic: A Technical Synthesis of Latest Developments

11Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Caskey, J., McConnell, I. L., Oguss, M., Dligach, D., Kulikoff, R., Grogan, B., … Afshar, M. (2022). Identifying COVID-19 Outbreaks From Contact-Tracing Interview Forms for Public Health Departments: Development of a Natural Language Processing Pipeline. JMIR Public Health and Surveillance, 8(3). https://doi.org/10.2196/36119

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 3

43%

Researcher 2

29%

Professor / Associate Prof. 1

14%

Lecturer / Post doc 1

14%

Readers' Discipline

Tooltip

Medicine and Dentistry 3

50%

Computer Science 1

17%

Nursing and Health Professions 1

17%

Social Sciences 1

17%

Save time finding and organizing research with Mendeley

Sign up for free