A Distributed Multi-source Feature Selection Using Spark

1Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Feature selection is one of the key problems in data pre-processing because it brings the immediate effects on the data mining algorithm. Using high-dimensional data sets, we can describe the data based on multiple sources, which corresponding to different knowledge sources. Multi-source feature selection is another topic relevant with large-scale data. Learning and selecting features from multiple data sources is becoming more common and much needed in many real-world applications. In this work, we propose a new multisource feature selection method based on traditional filters where data sources contain the same set of instances but different sets of features. This method is implemented using Spark as a powerful parallel framework for large-scale data processing. Conducted experiments approve the effectiveness of our approach in terms of execution time and where the classification accuracy is maintained.

Cite

CITATION STYLE

APA

Zaghdoudi, B., Bouaguel, W., & Essoussi, N. (2020). A Distributed Multi-source Feature Selection Using Spark. In Advances in Intelligent Systems and Computing (Vol. 921, pp. 312–320). Springer Verlag. https://doi.org/10.1007/978-3-030-14118-9_31

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free