A new feature sampling method in random forests for predicting high-dimensional data

Thanh Tung Nguyen; He Zhao; Joshua Zhexue Huang; Thuy Thi Nguyen; Mark Junjie Li

Conference Proceedings

A new feature sampling method in random forests for predicting high-dimensional data

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2015) 9078 459-470

DOI: 10.1007/978-3-319-18032-8_36

5Citations

7Readers

Get full text

Abstract

Random Forests (RF) models have been proven to perform well in both classification and regression. However, with the randomizing mechanism in both bagging samples and feature selection, the performance of RF can deteriorate when applied to high-dimensional data. In this paper, we propose a new approach for feature sampling for RF to deal with high-dimensional data. We first apply p-value to assess the feature importance on finding a cut-off between informative and less informative features. The set of informative features is then further partitioned into two groups, highly informative and informative features, using some statistical measures. When sampling the feature subspace for learning RFs, features from the three groups are taken into account. The new subspace sampling method maintains the diversity and the randomness of the forest and enables one to generate trees with a lower prediction error. In addition, quantile regression is employed to obtain predictions in the regression problem for a robustness towards outliers. The experimental results demonstrated that the proposed approach for learning random forests significantly reduced prediction errors and outperformed most existing random forests when dealing with high-dimensional data.

Author supplied keywords

Cite

CITATION STYLE

APA

Nguyen, T. T., Zhao, H., Huang, J. Z., Nguyen, T. T., & Li, M. J. (2015). A new feature sampling method in random forests for predicting high-dimensional data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9078, pp. 459–470). Springer Verlag. https://doi.org/10.1007/978-3-319-18032-8_36

A new feature sampling method in random forests for predicting high-dimensional data

Abstract

Author supplied keywords

Cite

Register to see more suggestions