Abstract
A central goal in the world of statistics and data science is the construction of linear regression models for continuous variables of interest. Often, our objective is to examine the impact of one or more explanatory variables, after adjusting for demographic covariates or other known/relevant factor(s). While the traditional approach is to use hypothesis testing to determine statistical significance, the p-values obtained are heavily dependent on sample size. This is particularly problematic for large datasets or “overpowered” studies, where even the tiniest of effects will appear to be highly significant. Computing capabilities and cloud-enhanced data sharing have revolutionized the way we use data worldwide, from healthcare and investments to manufacturing and retail. While machine learning and artificial intelligence are improving predictive analytics, we need better statistical inference to help understand and translate our models into meaningful and actionable insights. The coefficient of partial determination (or partialR2) is widely used in applied science to supplement hypothesis testing, but little work has been done to understand its statistical properties. In this work, we derive the complete distribution of partial R2 and perform simulated and real-world data analyses to show the advantages of adding it to your next analysis of Big Data.
Author supplied keywords
Cite
CITATION STYLE
Hawk, G. S., & Thompson, K. L. (2024). Deriving the Distribution and Exploring the Utility of Partial R2 in the Era of Big Data. Journal of Statistical Theory and Applications, 23(2), 115–128. https://doi.org/10.1007/s44199-024-00074-y
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.