Classification Accuracy Comparison for Imbalanced Datasets with Its Balanced Counterparts Obtained by Different Sampling Techniques

4Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Machine learning (ML) is accurate and reliable in solving supervised problems such as classification, when the training is performed appropriately for the predefined classes. In real world scenario, during the dataset creation, class imbalance may arise, where one of the classes has huge number of instances while the other class has very less in numbers. In other words, the class distribution is not equal. Such scenarios results in anomalous prediction result. Handling of imbalanced dataset is therefore required to make correct prediction considering all the class scenarios in an equal ratio. The paper mentions various external and internal techniques to balance dataset found in literature survey along with experimental analysis of four different datasets from various domains- medical, mining, security, finance. The experiments are done using Python. External balancing techniques are used to balance the datasets- two oversampling SMOTE and ADASYN techniques and two undersampling Random Undersampling and Near Miss techniques. These datasets are used for binary classification task. Three machine learning classification algorithms such as logistic regression, random forest and decision tree are applied to imbalanced and balanced datasets to compare and contrast the performances. Comparisons with both balanced and unbalanced are reported. It has been found that undersample technique loses many important datapoints and thereby predicts with low accuracy. For all the datasets it is observed that oversampling technique ADASYN makes some decent prediction with appropriate balance.

Cite

CITATION STYLE

APA

Goswami, T., & Roy, U. B. (2021). Classification Accuracy Comparison for Imbalanced Datasets with Its Balanced Counterparts Obtained by Different Sampling Techniques. In Lecture Notes in Electrical Engineering (Vol. 698, pp. 45–54). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-15-7961-5_5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free