Simulating Complexity Measures on Imbalanced Datasets

3Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Classification tasks using imbalanced datasets are not challenging on their own. Classification models perform poorly on the minority class when the datasets present other difficulties, such as class overlap and complex decision border. Data complexity measures can identify such difficulties, better dealing with imbalanced datasets. They can capture information about data overlapping, neighborhood, and linearity. Even though they were recently decomposed by classes to deal with imbalanced datasets, their high computational cost prevents their use on applications with a time restriction, such as recommendation systems or high dimensional datasets. In this paper, we use a Meta-Learning approach to estimate the decomposed data complexity measures. We show that the simulated measures assess the difficulty of the dataset after applying preprocessing techniques to different sample sizes. We also show that this approach is significantly faster than computing the original measures, with a statistically similar estimation error for both classes.

Cite

CITATION STYLE

APA

Barella, V. H., Garcia, L. P. F., & de Carvalho, A. C. P. L. F. (2020). Simulating Complexity Measures on Imbalanced Datasets. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12320 LNAI, pp. 498–512). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-61380-8_34

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free