On Estimating Model in Feature Selection with Cross-Validation

20Citations
Citations of this article
37Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Both wrapper and hybrid methods in feature selection need the intervention of learning algorithm to train parameters. The preset parameters and dataset are used to construct several sub-optimal models, from which the final model is selected. The question is how to evaluate the performance of these sub-optimal models? What are the effects of different evaluation methods of sub-optimal model on the result of feature selection? Aiming at the evaluation problem of predictive models in feature selection, we chose a hybrid feature selection algorithm, FDHSFFS, and conducted comparative experiments on four UCI datasets with large differences in feature dimension and sample size by using five different cross-validation (CV) methods. The experimental results show that in the process of feature selection, twofold CV and leave-one-out-CV are more suitable for the model evaluation of low-dimensional and small sample datasets, tenfold nested CV and tenfold CV are more suitable for the model evaluation of high-dimensional datasets; tenfold nested CV is close to the unbiased estimation, and different optimal models may choose the same approximate optimal feature subset.

Cite

CITATION STYLE

APA

Qi, C., Diao, J., & Qiu, L. (2019). On Estimating Model in Feature Selection with Cross-Validation. IEEE Access, 7, 33454–33463. https://doi.org/10.1109/ACCESS.2019.2892062

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free