Distributed Evolutionary Feature Selection for Big Data Processing

0Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Feature selection has become a powerful dimensional reduction strategy and an effective tool in handling high-dimensional data. Feature selection aims to reduce the dimension of the feature space, to speed up and reduce the cost of the learning model and that by selecting the most relevant feature subset to data mining and machine learning tasks. The selection of optimal feature subset is an optimization problem that proved to be NP-hard. Metaheuristics are traditionally used to deal with NP-hard problems since they are well known for solving complex and real-world problems in reasonable period of time. Genetic algorithm (GA) is one of the most popular metaheuristics algorithms, which proved to be effective for an accurate feature selection task. However, in the last few decades, data have become progressively larger in both numbers of instances and features. This paradigm is being popularly termed as Big Data. With the tremendous growth of dataset sizes, most current feature selection algorithms and exceptionally GA become unscalable. To improve the scalability of a feature selection algorithm on big data, the distributed computing strategy is always adopted such as MapReduce model and Hadoop system. In this paper, we first present a review for the most recent works which handle the use of Parallel Genetic algorithm in large datasets. Then, we will propose a new Parallel Genetic algorithm based on the Coarse-grained parallelization model (island model). The parallelization of the process and the distribution of the partitioning of data will be performed using Hadoop system with an Amazon cluster. The performance and the scalability of the proposed method were theoretically and empirically compared to the existing feature selection methods when handling large-scale datasets and results confirm the effectiveness of our proposed method.

Cite

CITATION STYLE

APA

Bouaguel, W., & Ben Ncir, C. E. (2022). Distributed Evolutionary Feature Selection for Big Data Processing. Vietnam Journal of Computer Science, 9(3), 313–332. https://doi.org/10.1142/S2196888822500154

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free