Medical Imbalanced Data Classification Based on Random Forests

5Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper studies the problem of imbalance in medical datasets. Today, modern machine learning techniques are becoming increasingly popular for this type of problem, with examples in the areas of health and medical. One of the major difficulties with this technique is that the database handled is highly imbalance. Under-sampling and over-sampling techniques are used to work around this problem. In this paper, we apply random forests, which are combinations of decision trees fitted to subsamples of the data, built using under-sampling and over-sampling. At the end of the work, we compare fit metrics obtained in the various specifications of the models tested and evaluate their results inside and outside the sample. We observed that random forest techniques using imbalanced sub-samples smaller than the original sample presented the best performance among the random forests used and an improvement compared to that practiced in the medical dataset.

Cite

CITATION STYLE

APA

El-shafeiy, E., & Abohany, A. (2020). Medical Imbalanced Data Classification Based on Random Forests. In Advances in Intelligent Systems and Computing (Vol. 1153 AISC, pp. 81–91). Springer. https://doi.org/10.1007/978-3-030-44289-7_8

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free