Abstract WP382: Using Natural Language Processing Algorithms to Identify Stroke Cases and Stroke Subtypes From Neuroimaging Reports

Carin Northuis; Martin Michalowski; Kamakshi Lakshminarayan

Journal Article

Abstract WP382: Using Natural Language Processing Algorithms to Identify Stroke Cases and Stroke Subtypes From Neuroimaging Reports

Northuis C
Michalowski M
Lakshminarayan K

Stroke (2019) 50(Suppl_1)

DOI: 10.1161/str.50.suppl_1.wp382

N/ACitations

5Readers

Get full text

Abstract

Background/Objective: The long-term goal of our research is to develop automated, accurate methods for conducting stroke outcome surveillance across large populations. To facilitate this, we developed natural language processing (NLP) based machine learning algorithms to classify stroke cases and sub-type strokes using neuroimaging reports. We report on the performance of our algorithms. Methods: Our population of interest included patients with stroke symptoms presenting to the emergency room of our large academic healthcare system. We randomly sampled 332 probable stroke cases. A trained neurologist validated stroke diagnoses using neuroimaging reports. Data preprocessing included cleaning and normalizing the reports into a standardized format. We trained and tested machine learning algorithms using the formatted reports. The NLP-based algorithms predicted a stroke diagnosis (binary classification) and a stroke type diagnosis (multiclass classification) using n-grams of length 1 (i.e., ‘stroke’, ‘hemorrhage’) through 3 (i.e., ‘no mass effect’), term-frequency weighting, and feature dimensionality reduction via truncated singular value decomposition (SVD). We report algorithm performance using the area under the receiver operating characteristic curve (AUC-ROC). Classification methods we tested included Multinomial Naïve Bayes, Logistic Regression, Random Forest Classifier, and Support Vector Machine (SVM). Results: The highest performing algorithm for both stroke and stroke sub-type classification contained 1 to 2 n-grams, no term-frequency, and 200 SVD components. For the stroke case detection, SVM achieved the best AUC-ROC of 95.6%. For stroke sub-type detection, SVM also yielded the highest AUC-ROC of 93.5% for no stroke, 92.3% for ischemic stroke, 91.9% for intraparenchymal hemorrhage, and 94.8% for subarachnoid hemorrhage. Conclusions: We report very promising results in our pilot study using machine learning algorithms to classify stroke cases from neuroimaging reports. Feature selection on this pilot data revealed a subset of words that are highly effective in categorizing stroke. Future work will focus on algorithm improvement, finer grained stroke sub-type stratification, and multi-label phenotyping.

Cite

CITATION STYLE

APA

Northuis, C., Michalowski, M., & Lakshminarayan, K. (2019). Abstract WP382: Using Natural Language Processing Algorithms to Identify Stroke Cases and Stroke Subtypes From Neuroimaging Reports. Stroke, 50(Suppl_1). https://doi.org/10.1161/str.50.suppl_1.wp382

Abstract WP382: Using Natural Language Processing Algorithms to Identify Stroke Cases and Stroke Subtypes From Neuroimaging Reports

Abstract

Cite

Register to see more suggestions