In this paper, we compare the performance of a canonical genetic algorithm (CGA), the Self Adaptive Genetic Algorithm (SAGA), and a feed-forward neural network (FFNN) on a predictive modeling problem with incomplete data. Predictive modeling involves learning relationships between the features and labels of the data points in a dataset. Datasets with missing input values may cause problems for some learning algorithms by biasing the learned models. Imputation refers to techniques for replacing missing data through methods such as statistical probabilities, multivariate analysis, machine learning, or K-nearest neighbors. We study how imputed datasets impact the ability for CGA, SAGA, and FFNN to learn effective models. Results indicate that imputation method has little effect on CGA and SAGA performance and a noticeable effect on FFNN performance. All three algorithms perform similarly when applied to data imputed by univariate strategies, but FFNN is noticeably worse on data imputed by trained multivariate strategies. With increased quantities of imputed data, test accuracy decreases for all three algorithms while control accuracy remains surprisingly stable in all cases except for FFNN on trained multivariate imputation. Interestingly, CGA and SAGA identify the most relevant input values, even when a large amount of the data is imputed.
CITATION STYLE
Martinez, E. S., Maldonado, S. V., Wu, A. S., McMahan, R. P., Liu, X., & Oakley, B. (2022). Effects of imputation strategy on genetic algorithms and neural networks on a binary classification problem. In GECCO 2022 - Proceedings of the 2022 Genetic and Evolutionary Computation Conference (pp. 1272–1280). Association for Computing Machinery, Inc. https://doi.org/10.1145/3512290.3528863
Mendeley helps you to discover research relevant for your work.