Small Sample Behaviors of the Delete- d Cross Validation Statistic

  • Kastens J
N/ACitations
Citations of this article
9Readers
Mendeley users who have this article in their library.

Abstract

Built upon an iterative process of resampling without replacement and out-of-sample prediction, the deleted cross validation statistic CV(d) provides a robust estimate of forecast error variance. To compute CV(d), a dataset consisting of n observations of predictor and response values is systematically and repeatedly partitioned (split) into subsets of size n-d (used for model training) and d (used for model testing). Two aspects of CV(d) are explored in this paper. First, estimates for the unknown expected value E[CV(d)] are simulated in an OLS linear regression setting. Results suggest general formulas for E[CV(d)] dependent on σ 2 ("true" model error variance), n-d (train-ing set size), and p (number of predictors in the model). The conjectured E[CV(d)] formulas are connected back to theory and generalized. The formulas break down at the two largest allowable d values (d = n-p-1 and d = n-p, the 1 and 0 degrees of freedom cases), and numerical instabili-ties are observed at these points. An explanation for this distinct behavior remains an open question. For the second analysis, simulation is used to demonstrate how the previously established asymptotic conditions {d/n → 1 and n-d → ∞ as n → ∞} required for optimal linear model selection using CV(d) for model ranking are manifested in the smallest sample setting, using either independent or correlated candidate predictors.

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Kastens, J. H. (2015). Small Sample Behaviors of the Delete- d Cross Validation Statistic. Open Journal of Statistics, 05(05), 382–392. https://doi.org/10.4236/ojs.2015.55040

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 4

80%

Researcher 1

20%

Readers' Discipline

Tooltip

Environmental Science 3

60%

Mathematics 1

20%

Engineering 1

20%

Save time finding and organizing research with Mendeley

Sign up for free