The Random Forest (RF) algorithm consists of an assembly of base decision trees, constructed from Bootstrap subsets of the original dataset. Each subset is a sample of instances (rows) by a random subset of features (variables or columns) of the original dataset to be classified. In RF, pruning is not applied in the generation of base trees and in the classification process of a new record, each tree issues a vote enabling the selected class to be defined, as that with the most votes. Bearing in mind that in the state of the art it is defined that random feature selection for constructing the Bootstrap subsets decreases the quality of the results achieved with RF, in this work the integration of covering arrays (CA) in RF is proposed to solve this situation, in an algorithm called RFCA. In RFCA, the number N of rows of the CA defines the lowest number of base trees that require to be generated in RF and each row of the CA defines the features that each Bootstrap subset will use in the creation of each tree. To evaluate the new proposal, 32 datasets available in the UCI repository are used and compared with the RF available in Weka. The experiments show that the use of a CA of strength 2 to 7 obtains promising results in terms of accuracy.
CITATION STYLE
Vivas, S., Cobos, C., & Mendoza, M. (2019). Covering arrays to support the process of feature selection in the random forest classifier. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11331 LNCS, pp. 64–76). Springer Verlag. https://doi.org/10.1007/978-3-030-13709-0_6
Mendeley helps you to discover research relevant for your work.