Shedding light on the role of sample sizes and splitting proportions in out-of-sample tests: A Monte Carlo cross-validation approach

Christian Janze

Conference Proceedings

Shedding light on the role of sample sizes and splitting proportions in out-of-sample tests: A Monte Carlo cross-validation approach

Janze C

Atas da Conferencia da Associacao Portuguesa de Sistemas de Informacao (2017) 17 245-259

DOI: 10.18803/capsi.v17.245-259

2Citations

8Readers

Get full text

Abstract

We examine whether the popular 2/3 rule-of-thumb splitting criterion used in out-of-sample evaluation of predictive econometric and machine learning models makes sense. We conduct simulations regarding the predictive performance of the logistic regression and decision tree algorithm when considering varying splitting points as well as sample sizes. Our non-exhaustive repeated random sub-sampling simulation approach known as Monte Carlo cross-validation indicates that while the 2/3 rule-of-thumb works, there is a spectrum of different splitting proportions that yield equally compelling results. Furthermore, our results indicate that the size of the complete sample has little impact on the applicability of the 2/3 rule-of-thumb. However, our analysis reveals that when considering relatively small and relatively large training samples in relation to the sample size, the variation of the predictive accuracy can lead to misleading results. Our results are especially important for IS researchers considering the usage of out-of-sample methods for evaluating their predictive models.

Author supplied keywords

Cite

CITATION STYLE

APA

Janze, C. (2017). Shedding light on the role of sample sizes and splitting proportions in out-of-sample tests: A Monte Carlo cross-validation approach. In Atas da Conferencia da Associacao Portuguesa de Sistemas de Informacao (Vol. 17, pp. 245–259). Associacao Portuguesa de Sistemas de Informacao. https://doi.org/10.18803/capsi.v17.245-259

Shedding light on the role of sample sizes and splitting proportions in out-of-sample tests: A Monte Carlo cross-validation approach

Abstract

Author supplied keywords

Cite

Register to see more suggestions