An exploratory data analysis: the performance differences of a medical code prediction system on different demographic groups

1Citations
Citations of this article
29Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Recent studies show that neural natural processing models for medical code prediction suffer from a label imbalance issue. This study aims to investigate further imbalance in a medical code prediction dataset in terms of demographic variables and analyse performance differences in demographic groups. We use sample-based metrics to correctly evaluate the performance in terms of the data subject. Also, a simple label distance metric is proposed to quantify the difference in the label distribution between a group and the entire data. Our analysis results reveal that the model performs differently towards different demographic groups: significant differences between age groups and between insurance types are observed. Interestingly, we found a weak positive correlation between the number of training data of the group and the performance of the group. However, a strong negative correlation between the label distance of the group and the performance of the group is observed. This result suggests that the model tends to perform poorly in the group whose label distribution is different from the global label distribution of the training data set. Further analysis of the model performance is required to identify the cause of these differences and to improve the model building.

Cite

CITATION STYLE

APA

Shim, H., Lowet, D., Luca, S., & Vanrumste, B. (2022). An exploratory data analysis: the performance differences of a medical code prediction system on different demographic groups. In ClinicalNLP 2022 - 4th Workshop on Clinical Natural Language Processing, Proceedings (pp. 93–102). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.clinicalnlp-1.10

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free