This article introduces a collection of four datasets, similar to Anscombe’s quartet, that aim to highlight the challenges involved when estimating causal effects. Each of the four datasets is generated based on a distinct causal mechanism: the first involves a collider, the second involves a confounder, the third involves a mediator, and the fourth involves the induction of M-Bias by an included factor. The article includes a mathematical summary of each dataset, as well as directed acyclic graphs that depict the relationships between the variables. Despite the fact that the statistical summaries and visualizations for each dataset are identical, the true causal effect differs, and estimating it correctly requires knowledge of the data-generating mechanism. These example datasets can help practitioners gain a better understanding of the assumptions underlying causal inference methods and emphasize the importance of gathering more information beyond what can be obtained from statistical tools alone. The article also includes R code for reproducing all figures and provides access to the datasets themselves through an R package named “quartets.” Supplementary materials for this article are available online.
CITATION STYLE
D’Agostino McGowan, L., Gerke, T., & Barrett, M. (2024). Causal Inference Is Not Just a Statistics Problem. Journal of Statistics and Data Science Education, 32(2), 150–155. https://doi.org/10.1080/26939169.2023.2276446
Mendeley helps you to discover research relevant for your work.