Double machine learning and automated confounder selection: A cautionary tale

Paul Hünermund; Beyers Louw; Itamar Caspi

Journal ArticleOPEN ACCESS

Double machine learning and automated confounder selection: A cautionary tale

Journal of Causal Inference (2023) 11(1)

DOI: 10.1515/jci-2022-0078

21Citations

37Readers

Abstract

Double machine learning (DML) has become an increasingly popular tool for automated variable selection in high-dimensional settings. Even though the ability to deal with a large number of potential covariates can render selection-on-observables assumptions more plausible, there is at the same time a growing risk that endogenous variables are included, which would lead to the violation of conditional independence. This article demonstrates that DML is very sensitive to the inclusion of only a few "bad controls"in the covariate space. The resulting bias varies with the nature of the theoretical causal model, which raises concerns about the feasibility of selecting control variables in a data-driven way.

Author supplied keywords

Cite

CITATION STYLE

APA

Hünermund, P., Louw, B., & Caspi, I. (2023). Double machine learning and automated confounder selection: A cautionary tale. Journal of Causal Inference, 11(1). https://doi.org/10.1515/jci-2022-0078

Double machine learning and automated confounder selection: A cautionary tale

Abstract

Author supplied keywords

Cite

Register to see more suggestions