Feature selection for multiclass binary data

2Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Feature selection in binary datasets is an important task in many real world machine learning applications such as document classification, genomic data analysis, and image recognition. Despite many algorithms available, selecting features that distinguish all classes from one another in a multiclass binary dataset remains a challenge. Furthermore, many existing feature selection methods incur unnecessary computation costs for binary data, as they are not specifically designed for binary data. We show that exploiting the symmetry and feature value imbalance of binary datasets, more efficient feature selection measures that can better distinguish the classes in multiclass binary datasets can be developed. Using these measures, we propose a greedy feature selection algorithm, CovSkew, for multiclass binary data. We show that CovSkew achieves high accuracy gain over baseline methods, upto ∼ 40%, especially when the selected feature subset is small. We also show that CovSkew has low computational costs compared with most of the baselines.

Cite

CITATION STYLE

APA

Perera, K., Chan, J., & Karunasekera, S. (2018). Feature selection for multiclass binary data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10939 LNAI, pp. 52–63). Springer Verlag. https://doi.org/10.1007/978-3-319-93040-4_5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free