Genetic programming (GP) is a powerful classification technique. It is interpretable and it can dynamically build very complex expressions that maximize or minimize some fitness functions. It has a capacity to model very complex problems in the area of Machine Learning, Data Mining and Pattern Recognition. Nevertheless, GP has a high computational complexity time. On the other side, data standardization is one of the most important pre-processing steps in machine learning. The purpose of this step is to unify the scale of all input features to have equal contribution to the model. The objective of this paper is to investigate the influence of input data standardization methods on GP, and how it affects its prediction accuracy. Six different methods of input data standardization were checked in order to determine which one allows to achieve the most accurate result with lowest computational cost. The simulations have been implemented on ten benchmarked datasets with three different scenarios (varying the population size and number of generations). The results showed that the computational efficiency of GP is highly enhanced when coupled with some standardization methods, specifically Min-Max method for scenario I and Vector method for scenario II, and scenario III. Whereas, Manhattan and Z-Score methods had the worst results for all three scenarios.
CITATION STYLE
Al Shorman, A. R., Faris, H., Castillo, P. A., & Merelo, J. J. (2018). The Influence of Input Data Standardization Methods on the Prediction Accuracy of Genetic Programming Generated Classifiers. In International Joint Conference on Computational Intelligence (Vol. 1, pp. 79–85). Science and Technology Publications, Lda. https://doi.org/10.5220/0006959000790085
Mendeley helps you to discover research relevant for your work.