Abstract
Linear regression is a simple yet powerful tool that has been extensively used in all fields where the relationships among variables are of interest. When linear regression is applied, the coefficient of determination or R-squared ( R 2 ) is commonly reported as a metric gauging the model’s goodness of fit. Despite its wide usage, however, R 2 has been commonly misinterpreted as the proportion or percent of variation in the dependent variable that is explained by the independent variables (PVE -- percent of variation explained). This study demonstrated R 2 substantially overstates the true PVE. When the assumptions of linear regression are met, R 2 overstates PVE by up to 100%. For instance, when R 2 is 0.99, 0.80, 0.50, or 0.10, the true PVE is 0.9, 0.55, 0.29, or 0.05, respectively. The misinterpretation of R 2 , which greatly exaggerates the effect of the interventions or causes on the outcomes, could exert undue influence on clinical decisions in medicine and policy decisions in other fields such as environmental protection and climate change research. Therefore, when linear regression is applied, reporting the true PVE is warranted.
Cite
CITATION STYLE
Gao, J. (2024). R-Squared (R 2 ) – How much variation is explained? Research Methods in Medicine & Health Sciences, 5(4), 104–109. https://doi.org/10.1177/26320843231186398
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.