Research on optimization of random forest algorithm based on spark

18Citations
Citations of this article
22Readers
Mendeley users who have this article in their library.
Get full text

Abstract

As society has developed, increasing amounts of data have been generated by various industries. The random forest algorithm, as a classification algorithm, is widely used because of its superior performance. However, the random forest algorithm uses a simple random sampling feature selection method when generating feature subspaces which cannot distinguish redundant features, thereby affecting its classification accuracy, and resulting in a low data calculation efficiency in the stand-alone mode. In response to the aforementioned problems, related optimization research was conducted with Spark in the present paper. This improved random forest algorithm performs feature extraction according to the calculated feature importance to form a feature subspace. When generating a random forest model, it selects decision trees based on the similarity and classification accuracy of different decision. Experimental results reveal that compared with the original random forest algorithm, the improved algorithm proposed in the present paper exhibited a higher classification accuracy rate and could effectively classify data.

Cite

CITATION STYLE

APA

Wang, S., Zhang, Z., Geng, S., & Pang, C. (2022). Research on optimization of random forest algorithm based on spark. Computers, Materials and Continua, 71(2), 3721–3731. https://doi.org/10.32604/cmc.2022.015378

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free