Searching for Optimal Oversampling to Process Imbalanced Data: Generative Adversarial Networks and Synthetic Minority Over-Sampling Technique

  • Eom G
  • Byeon H
2Citations
Citations of this article
27Readers
Mendeley users who have this article in their library.

Abstract

Classification problems due to data imbalance occur in many fields and have long been studied in the machine learning field. Many real-world datasets suffer from the issue of class imbalance, which occurs when the sizes of classes are not uniform; thus, data belonging to the minority class are likely to be misclassified. It is particularly important to overcome this issue when dealing with medical data because class imbalance inevitably arises due to incidence rates within medical datasets. This study adjusted the imbalance ratio (IR) within the National Biobank of Korea dataset “Epidemiologic data of Parkinson’s disease dementia patients” to values of 6.8 (raw data), 9, and 19 and compared four traditional oversampling methods with techniques using the conditional generative adversarial network (CGAN) and conditional tabular generative adversarial network (CTGAN). The results showed that when the classes were balanced with CGAN and CTGAN, they showed a better classification performance than the more traditional oversampling techniques based on the AUC and F1-score. We were able to expand the application scope of GAN, widely used in unstructured data, to structured data. We also offer a better solution for the imbalanced data problem and suggest future research directions.

Cite

CITATION STYLE

APA

Eom, G., & Byeon, H. (2023). Searching for Optimal Oversampling to Process Imbalanced Data: Generative Adversarial Networks and Synthetic Minority Over-Sampling Technique. Mathematics, 11(16), 3605. https://doi.org/10.3390/math11163605

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free