Prediction error estimation: A comparison of resampling methods

Annette M. Molinaro; Richard Simon; Ruth M. Pfeiffer

Journal ArticleOPEN ACCESS

Prediction error estimation: A comparison of resampling methods

Bioinformatics (2005) 21(15) 3301-3307

DOI: 10.1093/bioinformatics/bti499

1.0kCitations

847Readers

Abstract

Motivation: In genomic studies, thousands of features are collected on relatively few samples. One of the goals of these studies is to build classifiers to predict the outcome of future observations. There are three inherent steps to this process: feature selection, model selection and prediction assessment. With a focus on prediction assessment, we compare several methods for estimating the 'true' prediction error of a prediction model in the presence of feature selection. Results: For small studies where features are selected from thousands of candidates, the resubstitution and simple split-sample estimates are seriously biased. In these small samples, leave-one-out cross-validation (LOOCV), 10-fold cross-validation (CV) and the .632+ bootstrap have the smallest bias for diagonal discriminant analysis, nearest neighbor and classification trees. LOOCV and 10-fold CV have the smallest bias for linear discriminant analysis. Additionally, LOOCV, 5- and 10-fold CV, and the .632+ bootstrap have the lowest mean square error. The .632+ bootstrap is quite biased in small sample sizes with strong signal-to-noise ratios. Differences in performance among resampling methods are reduced as the number of specimens available increase.

Cite

CITATION STYLE

APA

Molinaro, A. M., Simon, R., & Pfeiffer, R. M. (2005). Prediction error estimation: A comparison of resampling methods. Bioinformatics, 21(15), 3301–3307. https://doi.org/10.1093/bioinformatics/bti499

Prediction error estimation: A comparison of resampling methods

Abstract

Cite

Register to see more suggestions