Simulating Complexity Measures on Imbalanced Datasets

Victor H. Barella; Luís P.F. Garcia; André C.P.L.F. de Carvalho

Conference Proceedings

Simulating Complexity Measures on Imbalanced Datasets

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2020) 12320 LNAI 498-512

DOI: 10.1007/978-3-030-61380-8_34

3Citations

5Readers

Get full text

Abstract

Classification tasks using imbalanced datasets are not challenging on their own. Classification models perform poorly on the minority class when the datasets present other difficulties, such as class overlap and complex decision border. Data complexity measures can identify such difficulties, better dealing with imbalanced datasets. They can capture information about data overlapping, neighborhood, and linearity. Even though they were recently decomposed by classes to deal with imbalanced datasets, their high computational cost prevents their use on applications with a time restriction, such as recommendation systems or high dimensional datasets. In this paper, we use a Meta-Learning approach to estimate the decomposed data complexity measures. We show that the simulated measures assess the difficulty of the dataset after applying preprocessing techniques to different sample sizes. We also show that this approach is significantly faster than computing the original measures, with a statistically similar estimation error for both classes.

Author supplied keywords

Cite

CITATION STYLE

APA

Barella, V. H., Garcia, L. P. F., & de Carvalho, A. C. P. L. F. (2020). Simulating Complexity Measures on Imbalanced Datasets. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12320 LNAI, pp. 498–512). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-61380-8_34

Simulating Complexity Measures on Imbalanced Datasets

Abstract

Author supplied keywords

Cite

Register to see more suggestions