The effect of data resampling methods in radiomics

1Citations
Citations of this article
16Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Radiomic datasets can be class-imbalanced, for instance, when the prevalence of diseases varies notably, meaning that the number of positive samples is much smaller than that of negative samples. In these cases, the majority class may dominate the model's training and thus negatively affect the model's predictive performance, leading to bias. Therefore, resampling methods are often utilized to class-balance the data. However, several resampling methods exist, and neither their relative predictive performance nor their impact on feature selection has been systematically analyzed. In this study, we aimed to measure the impact of nine resampling methods on radiomic models utilizing a set of fifteen publicly available datasets regarding their predictive performance. Furthermore, we evaluated the agreement and similarity of the set of selected features. Our results show that applying resampling methods did not improve the predictive performance on average. On specific datasets, slight improvements in predictive performance (+ 0.015 in AUC) could be seen. A considerable disagreement on the set of selected features was seen (only 28.7% of features agreed), which strongly impedes feature interpretability. However, selected features are similar when considering their correlation (82.9% of features correlated on average).

Cite

CITATION STYLE

APA

Demircioğlu, A. (2024). The effect of data resampling methods in radiomics. Scientific Reports, 14(1). https://doi.org/10.1038/s41598-024-53491-5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free