Augmented backward elimination: A pragmatic and purposeful way to develop statistical models

137Citations
Citations of this article
161Readers
Mendeley users who have this article in their library.

Abstract

Statistical models are simple mathematical rules derived from empirical data describing the association between an outcome and several explanatory variables. In a typical modeling situation statistical analysis often involves a large number of potential explanatory variables and frequently only partial subject-matter knowledge is available. Therefore, selecting the most suitable variables for a model in an objective and practical manner is usually a non-trivial task. We briefly revisit the purposeful variable selection procedure suggested by Hosmer and Lemeshow which combines significance and change-in-estimate criteria for variable selection and critically discuss the change-in-estimate criterion. We show that using a significance-based threshold for the change-in-estimate criterion reduces to a simple significance-based selection of variables, as if the change-in-estimate criterion is not considered at all. Various extensions to the purposeful variable selection procedure are suggested. We propose to use backward elimination augmented with a standardized change-in-estimate criterion on the quantity of interest usually reported and interpreted in a model for variable selection. Augmented backward elimination has been implemented in a SAS macro for linear, logistic and Cox proportional hazards regression. The algorithm and its implementation were evaluated by means of a simulation study. Augmented backward elimination tends to select larger models than backward elimination and approximates the unselected model up to negligible differences in point estimates of the regression coefficients. On average, regression coefficients obtained after applying augmented backward elimination were less biased relative to the coefficients of correctly specified models than after backward elimination. In summary, we propose augmented backward elimination as a reproducible variable selection algorithm that gives the analyst more flexibility in adopting model selection to a specific statistical modeling situation.

References Powered by Scopus

A simulation study of the number of events per variable in logistic regression analysis

6317Citations
N/AReaders
Get full text

Causal diagrams for epidemiologic research

3135Citations
N/AReaders
Get full text

Purposeful selection of variables in logistic regression

2774Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Variable selection – A review and recommendations for the practicing statistician

1019Citations
N/AReaders
Get full text

Five myths about variable selection

379Citations
N/AReaders
Get full text

Polysomnographic endotyping to select patients with obstructive sleep apnea for oral appliances

76Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Dunkler, D., Plischke, M., Leffondré, K., & Heinze, G. (2014). Augmented backward elimination: A pragmatic and purposeful way to develop statistical models. PLoS ONE, 9(11). https://doi.org/10.1371/journal.pone.0113677

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 67

64%

Researcher 21

20%

Professor / Associate Prof. 12

12%

Lecturer / Post doc 4

4%

Readers' Discipline

Tooltip

Medicine and Dentistry 42

64%

Nursing and Health Professions 9

14%

Engineering 9

14%

Psychology 6

9%

Save time finding and organizing research with Mendeley

Sign up for free