This paper studies the problem of imbalance in medical datasets. Today, modern machine learning techniques are becoming increasingly popular for this type of problem, with examples in the areas of health and medical. One of the major difficulties with this technique is that the database handled is highly imbalance. Under-sampling and over-sampling techniques are used to work around this problem. In this paper, we apply random forests, which are combinations of decision trees fitted to subsamples of the data, built using under-sampling and over-sampling. At the end of the work, we compare fit metrics obtained in the various specifications of the models tested and evaluate their results inside and outside the sample. We observed that random forest techniques using imbalanced sub-samples smaller than the original sample presented the best performance among the random forests used and an improvement compared to that practiced in the medical dataset.
CITATION STYLE
El-shafeiy, E., & Abohany, A. (2020). Medical Imbalanced Data Classification Based on Random Forests. In Advances in Intelligent Systems and Computing (Vol. 1153 AISC, pp. 81–91). Springer. https://doi.org/10.1007/978-3-030-44289-7_8
Mendeley helps you to discover research relevant for your work.