Random Forest is an ensemble method that combines many trees constructed from bootstrap samples of the original data. Random Forest is used for both classification and regression and provides many advantages such as having a high accuracy, calculating a generalization error, determining the important variables and outliers, performing supervised and unsupervised learning and imputing missing values with an algorithm based on proximity matrix. In this study, we aimed to compare the proximity based imputation method of Random Forest with k nearest neighbor imputation prior to fitting. Therefore, simulation studies were performed for a classification problem under various scenarios including different percentage of missing values, number of neighbors and correlation structures between predictor variables. The results showed that for highly correlated structures proximity matrix based imputation method should be used meanwhile k nearest neighbor imputation method should be preferred for low and medium correlated structures.
CITATION STYLE
Özen, H., & Bal, C. (2019). Rasgele Orman Yönteminde Eksik Veri Probleminin İncelenmesi. OSMANGAZİ JOURNAL OF MEDICINE, 00. https://doi.org/10.20515/otd.496524
Mendeley helps you to discover research relevant for your work.