Abstract
Random forest is a flexible algorithm with a wide range of applications and performs well on a large number of data sets. Besides, Random forest is immune to statistical assumptions as well as preprocessing burden and can handle a large data set with high dimensionality and missing values. Nevertheless, random forest struggles with high-cardinality categorical variables, unbalanced data, time series forecasting, variables interpretation, and is sensitive to hyperparameter. Thus, random forest is relatively suitable for processing high-dimensional data and data with missing variables. Besides, random forest works well with a large amount of data, which is previously unprocessed. Moreover, random forest is an appropriate method, when there are prior statistical assumptions. However, random forest is non-ideal, when processing data with endogenous temporal effects or high-cardinality categorical variables, as well as when the interpretation is the primary goal. Despite the shortcomings of the random forest, there are still some improvements that can be made. It will be more convenient for users to screen methods, if there is a rating system to give an overall score towards all alternative algorithms depending on the input data and the users' goals.
Cite
CITATION STYLE
Zhu, T. (2020). Analysis on the applicability of the random forest. In Journal of Physics: Conference Series (Vol. 1607). Institute of Physics Publishing. https://doi.org/10.1088/1742-6596/1607/1/012123
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.