Effects of many feature candidates in feature selection and classification

4Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

We address the problems of analyzing many feature candidates when performing feature selection and error estimation on a limited data set. A Monte Carlo study of multivariate normal distributed data has been performed to illustrate the problems. Two feature selection methods are tested: Plus-1-Minus-1 and Sequential Forward Floating Selection. The simulations demonstrate that in order to find the correct features, the number of features initially analyzed is an important factor, besides the number of samples. Moreover, the sufficient ratio of number of training samples to feature candidates is not a constant. It depends on the number of feature candidates, training samples and the Mahalanobis distance between the classes. The two feature selection methods analyzed gave the same result. Furthermore, the simulations demonstrate how the leave-one-out error estimate can be a highly biased error estimate when feature selection is performed on the same data as the error estimation. It may even indicate complete separation of the classes, while no real difference between the classes exists.

Cite

CITATION STYLE

APA

Schulerud, H., & Albregtsen, F. (2002). Effects of many feature candidates in feature selection and classification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2396, pp. 480–487). Springer Verlag. https://doi.org/10.1007/3-540-70659-3_50

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free