Best subset feature selection for massive mixed-type problems

Eugene Tuv; Alexander Borisov; Kari Torkkola

Conference Proceedings

Best subset feature selection for massive mixed-type problems

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2006) 4224 LNCS 1048-1056

DOI: 10.1007/11875581_125

10Citations

5Readers

Get full text

Abstract

We address the problem of identifying a non-redundant subset of important variables. All modern feature selection approaches including filters, wrappers, and embedded methods experience problems in very general settings with massive mixed-type data, and with complex relationships between the inputs and the target. We propose an efficient ensemble-based approach measuring statistical independence between a target and a potentially very large number of inputs including any meaningful order of interactions between them, removing redundancies from the relevant ones, and finally ranking variables in the identified minimum feature set. Experiments with synthetic data illustrate the sensitivity and the selectivity of the method, whereas the scalability of the method is demonstrated with a real car sensor data base. © Springer-Verlag Berlin Heidelberg 2006.

Cite

CITATION STYLE

APA

Tuv, E., Borisov, A., & Torkkola, K. (2006). Best subset feature selection for massive mixed-type problems. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4224 LNCS, pp. 1048–1056). Springer Verlag. https://doi.org/10.1007/11875581_125

Best subset feature selection for massive mixed-type problems

Abstract

Cite

Register to see more suggestions