Feature selection is one of the key problems in data pre-processing because it brings the immediate effects on the data mining algorithm. Using high-dimensional data sets, we can describe the data based on multiple sources, which corresponding to different knowledge sources. Multi-source feature selection is another topic relevant with large-scale data. Learning and selecting features from multiple data sources is becoming more common and much needed in many real-world applications. In this work, we propose a new multisource feature selection method based on traditional filters where data sources contain the same set of instances but different sets of features. This method is implemented using Spark as a powerful parallel framework for large-scale data processing. Conducted experiments approve the effectiveness of our approach in terms of execution time and where the classification accuracy is maintained.
CITATION STYLE
Zaghdoudi, B., Bouaguel, W., & Essoussi, N. (2020). A Distributed Multi-source Feature Selection Using Spark. In Advances in Intelligent Systems and Computing (Vol. 921, pp. 312–320). Springer Verlag. https://doi.org/10.1007/978-3-030-14118-9_31
Mendeley helps you to discover research relevant for your work.