Statistical quantification of confounding bias in machine learning models

Tamas Spisak

Journal ArticleOPEN ACCESS

Statistical quantification of confounding bias in machine learning models

Spisak T

GigaScience (2022) 11

DOI: 10.1093/gigascience/giac082

8Citations

30Readers

Abstract

Background: The lack of nonparametric statistical tests for confounding bias significantly hampers the development of robust, valid, and generalizable predictive models in many fields of research. Here I propose the partial confounder test, which, for a given confounder variable, probes the null hypotheses of the model being unconfounded. Results: The test provides a strict control for type I errors and high statistical power, even for nonnormally and nonlinearly dependent predictions, often seen in machine learning. Applying the proposed test on models trained on large-scale functional brain connectivity data (N= 1,865) (i) reveals previously unreported confounders and (ii) shows that state-of-the-art confound mitigation approaches may fail preventing confounder bias in several cases. Conclusions: The proposed test (implemented in the package mlconfound; https://mlconfound.readthedocs.io) can aid the assessment and improvement of the generalizability and validity of predictive models and, thereby, fosters the development of clinically useful machine learning biomarkers.

Author supplied keywords

Cite

CITATION STYLE

APA

Spisak, T. (2022). Statistical quantification of confounding bias in machine learning models. GigaScience, 11. https://doi.org/10.1093/gigascience/giac082

Statistical quantification of confounding bias in machine learning models

Abstract

Author supplied keywords

Cite

Register to see more suggestions