Shedding light on the role of sample sizes and splitting proportions in out-of-sample tests: A Monte Carlo cross-validation approach

2Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We examine whether the popular 2/3 rule-of-thumb splitting criterion used in out-of-sample evaluation of predictive econometric and machine learning models makes sense. We conduct simulations regarding the predictive performance of the logistic regression and decision tree algorithm when considering varying splitting points as well as sample sizes. Our non-exhaustive repeated random sub-sampling simulation approach known as Monte Carlo cross-validation indicates that while the 2/3 rule-of-thumb works, there is a spectrum of different splitting proportions that yield equally compelling results. Furthermore, our results indicate that the size of the complete sample has little impact on the applicability of the 2/3 rule-of-thumb. However, our analysis reveals that when considering relatively small and relatively large training samples in relation to the sample size, the variation of the predictive accuracy can lead to misleading results. Our results are especially important for IS researchers considering the usage of out-of-sample methods for evaluating their predictive models.

Cite

CITATION STYLE

APA

Janze, C. (2017). Shedding light on the role of sample sizes and splitting proportions in out-of-sample tests: A Monte Carlo cross-validation approach. In Atas da Conferencia da Associacao Portuguesa de Sistemas de Informacao (Vol. 17, pp. 245–259). Associacao Portuguesa de Sistemas de Informacao. https://doi.org/10.18803/capsi.v17.245-259

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free