Medical Imbalanced Data Classification Based on Random Forests

Engy El-shafeiy; Amr Abohany

Conference Proceedings

Medical Imbalanced Data Classification Based on Random Forests

Advances in Intelligent Systems and Computing (2020) 1153 AISC 81-91

DOI: 10.1007/978-3-030-44289-7_8

5Citations

8Readers

Get full text

Abstract

This paper studies the problem of imbalance in medical datasets. Today, modern machine learning techniques are becoming increasingly popular for this type of problem, with examples in the areas of health and medical. One of the major difficulties with this technique is that the database handled is highly imbalance. Under-sampling and over-sampling techniques are used to work around this problem. In this paper, we apply random forests, which are combinations of decision trees fitted to subsamples of the data, built using under-sampling and over-sampling. At the end of the work, we compare fit metrics obtained in the various specifications of the models tested and evaluate their results inside and outside the sample. We observed that random forest techniques using imbalanced sub-samples smaller than the original sample presented the best performance among the random forests used and an improvement compared to that practiced in the medical dataset.

Author supplied keywords

Cite

CITATION STYLE

APA

El-shafeiy, E., & Abohany, A. (2020). Medical Imbalanced Data Classification Based on Random Forests. In Advances in Intelligent Systems and Computing (Vol. 1153 AISC, pp. 81–91). Springer. https://doi.org/10.1007/978-3-030-44289-7_8

Medical Imbalanced Data Classification Based on Random Forests

Abstract

Author supplied keywords

Cite

Register to see more suggestions