Effects of many feature candidates in feature selection and classification

Helene Schulerud; Fritz Albregtsen

Conference ProceedingsOPEN ACCESS

Effects of many feature candidates in feature selection and classification

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2002) 2396 480-487

DOI: 10.1007/3-540-70659-3_50

4Citations

4Readers

Abstract

We address the problems of analyzing many feature candidates when performing feature selection and error estimation on a limited data set. A Monte Carlo study of multivariate normal distributed data has been performed to illustrate the problems. Two feature selection methods are tested: Plus-1-Minus-1 and Sequential Forward Floating Selection. The simulations demonstrate that in order to find the correct features, the number of features initially analyzed is an important factor, besides the number of samples. Moreover, the sufficient ratio of number of training samples to feature candidates is not a constant. It depends on the number of feature candidates, training samples and the Mahalanobis distance between the classes. The two feature selection methods analyzed gave the same result. Furthermore, the simulations demonstrate how the leave-one-out error estimate can be a highly biased error estimate when feature selection is performed on the same data as the error estimation. It may even indicate complete separation of the classes, while no real difference between the classes exists.

Cite

CITATION STYLE

APA

Schulerud, H., & Albregtsen, F. (2002). Effects of many feature candidates in feature selection and classification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2396, pp. 480–487). Springer Verlag. https://doi.org/10.1007/3-540-70659-3_50

Effects of many feature candidates in feature selection and classification

Abstract

Cite

Register to see more suggestions