A Distributed Multi-source Feature Selection Using Spark

Bochra Zaghdoudi; Waad Bouaguel; Nadia Essoussi

Conference Proceedings

A Distributed Multi-source Feature Selection Using Spark

Advances in Intelligent Systems and Computing (2020) 921 312-320

DOI: 10.1007/978-3-030-14118-9_31

1Citations

3Readers

Get full text

Abstract

Feature selection is one of the key problems in data pre-processing because it brings the immediate effects on the data mining algorithm. Using high-dimensional data sets, we can describe the data based on multiple sources, which corresponding to different knowledge sources. Multi-source feature selection is another topic relevant with large-scale data. Learning and selecting features from multiple data sources is becoming more common and much needed in many real-world applications. In this work, we propose a new multisource feature selection method based on traditional filters where data sources contain the same set of instances but different sets of features. This method is implemented using Spark as a powerful parallel framework for large-scale data processing. Conducted experiments approve the effectiveness of our approach in terms of execution time and where the classification accuracy is maintained.

Author supplied keywords

Cite

CITATION STYLE

APA

Zaghdoudi, B., Bouaguel, W., & Essoussi, N. (2020). A Distributed Multi-source Feature Selection Using Spark. In Advances in Intelligent Systems and Computing (Vol. 921, pp. 312–320). Springer Verlag. https://doi.org/10.1007/978-3-030-14118-9_31

A Distributed Multi-source Feature Selection Using Spark

Abstract

Author supplied keywords

Cite

Register to see more suggestions