The accuracy of Random Forest performance can be improved by conducting a feature selection with a balancing strategy

17Citations
Citations of this article
69Readers
Mendeley users who have this article in their library.
Get full text

Abstract

One of the significant purposes of building a model is to increase its accuracy within a shorter timeframe through the feature selection process. It is carried out by determining the importance of available features in a dataset using Information Gain (IG). The process is used to calculate the amounts of information contained in features with high values selected to accelerate the performance of an algorithm. In selecting informative features, a threshold value (cut-off) is used by the Information Gain (IG). Therefore, this research aims to determine the time and accuracy-performance needed to improve feature selection by integrating IG, the Fast Fourier Transform (FFT), and Synthetic Minor Oversampling Technique (SMOTE) methods. The feature selection model is then applied to the Random Forest, a tree-based machine learning algorithm with random feature selection. A total of eight datasets consisting of three balanced and five imbalanced datasets were used to conduct this research. Furthermore, the SMOTE found in the imbalance dataset was used to balance the data. The result showed that the feature selection using Information Gain, FFT, and SMOTE improved the performance accuracy of Random Forest

Cite

CITATION STYLE

APA

Prasetiyowati, M. I., Maulidevi, N. U., & Surendro, K. (2022). The accuracy of Random Forest performance can be improved by conducting a feature selection with a balancing strategy. PeerJ Computer Science, 8. https://doi.org/10.7717/PEERJ-CS.1041

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free