A Performance Prediction Model for Spark Applications

Muhammad Usama Javaid; Ahmed Amir Kanoun; Florian Demesmaeker; Amine Ghrab; Sabri Skhiri

Conference Proceedings

A Performance Prediction Model for Spark Applications

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2020) 12402 LNCS 13-22

DOI: 10.1007/978-3-030-59612-5_2

3Citations

7Readers

Get full text

Abstract

Apache Spark is a popular open-source distributed processing framework that enables efficient processing of massive amounts of data. It has a large number of configuration parameters that are strongly related to performance. Spark performance, for a given application, can significantly vary because of input data type and size, design & implementation of algorithm, computational resources and parameter configuration. So, involvement of all these variables makes performance prediction very difficult. In this paper, we take into account all the variables and try to learn machine learning based performance prediction model. We ran extensive experiments on a selected set of Spark applications that cover the most common workloads to generate a representative dataset of execution time. In addition, we extracted application and data features to build a machine learning based performance model to predict Spark applications execution time. The experiments show that boosting algorithms achieved better results compared to the other algorithms.

Cite

CITATION STYLE

APA

Javaid, M. U., Kanoun, A. A., Demesmaeker, F., Ghrab, A., & Skhiri, S. (2020). A Performance Prediction Model for Spark Applications. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12402 LNCS, pp. 13–22). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-59612-5_2

A Performance Prediction Model for Spark Applications

Abstract

Cite

Register to see more suggestions