Data is too diverse. The diversity of data does not just exist in terms of dimensionality but also of varied datatypes. To extract most useful information from datasets and to improve the prediction accuracy, feature selection is of great importance in data mining. This paper is proposing a hybrid feature selection methodology with the motivation of producing most relevant feature subset and better predicting accuracy. A wrapper composed of Genetic Algorithm (GA), a heuristic search tool and Random Forest (RF) as a predicting model, keeping in view the optimality of Genetic Algorithm and predictive accuracy of Random Forests, is suggested. And for the purpose of creating a reduced search space for GA-RF wrapper, set of filters methods are used, which generate a reduced subset of features by weight assignment and filtration through threshold criteria. The proposed approach has been tested on Breast Cancer dataset from UCI repository and produced 99.04% prediction accuracy. A small comparative study is also carried out to justify that coupling of genetic algorithm and random forests followed by space reduction outperforms other wrapper-based approaches.
CITATION STYLE
Saqib, P., Qamar, U., Aslam, A., & Ahmad, A. (2019). Hybrid of Filters and Genetic Algorithm - Random Forests Based Wrapper Approach for Feature Selection and Prediction. In Advances in Intelligent Systems and Computing (Vol. 998, pp. 190–199). Springer Verlag. https://doi.org/10.1007/978-3-030-22868-2_15
Mendeley helps you to discover research relevant for your work.