Overlap-Based Undersampling Method for Classification of Imbalanced Medical Datasets

Pattaramon Vuttipittayamongkol; Eyad Elyan

Conference ProceedingsOPEN ACCESS

Overlap-Based Undersampling Method for Classification of Imbalanced Medical Datasets

IFIP Advances in Information and Communication Technology (2020) 584 IFIP 358-369

DOI: 10.1007/978-3-030-49186-4_30

31Citations

33Readers

Abstract

Early diagnosis of some life-threatening diseases such as cancers and heart is crucial for effective treatments. Supervised machine learning has proved to be a very useful tool to serve this purpose. Historical data of patients including clinical and demographic information is used for training learning algorithms. This builds predictive models that provide initial diagnoses. However, in the medical domain, it is common to have the positive class under-represented in a dataset. In such a scenario, a typical learning algorithm tends to be biased towards the negative class, which is the majority class, and misclassify positive cases. This is known as the class imbalance problem. In this paper, a framework for predictive diagnostics of diseases with imbalanced records is presented. To reduce the classification bias, we propose the usage of an overlap-based undersampling method to improve the visibility of minority class samples in the region where the two classes overlap. This is achieved by detecting and removing negative class instances from the overlapping region. This will improve class separability in the data space. Experimental results show achievement of high accuracy in the positive class, which is highly preferable in the medical domain, while good trade-offs between sensitivity and specificity were obtained. Results also show that the method often outperformed other state-of-the-art and well-established techniques.

Author supplied keywords

Cite

CITATION STYLE

APA

Vuttipittayamongkol, P., & Elyan, E. (2020). Overlap-Based Undersampling Method for Classification of Imbalanced Medical Datasets. In IFIP Advances in Information and Communication Technology (Vol. 584 IFIP, pp. 358–369). Springer. https://doi.org/10.1007/978-3-030-49186-4_30

Overlap-Based Undersampling Method for Classification of Imbalanced Medical Datasets

Abstract

Author supplied keywords

Cite

Register to see more suggestions