A Performance Prediction Model for Spark Applications

3Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Apache Spark is a popular open-source distributed processing framework that enables efficient processing of massive amounts of data. It has a large number of configuration parameters that are strongly related to performance. Spark performance, for a given application, can significantly vary because of input data type and size, design & implementation of algorithm, computational resources and parameter configuration. So, involvement of all these variables makes performance prediction very difficult. In this paper, we take into account all the variables and try to learn machine learning based performance prediction model. We ran extensive experiments on a selected set of Spark applications that cover the most common workloads to generate a representative dataset of execution time. In addition, we extracted application and data features to build a machine learning based performance model to predict Spark applications execution time. The experiments show that boosting algorithms achieved better results compared to the other algorithms.

Cite

CITATION STYLE

APA

Javaid, M. U., Kanoun, A. A., Demesmaeker, F., Ghrab, A., & Skhiri, S. (2020). A Performance Prediction Model for Spark Applications. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12402 LNCS, pp. 13–22). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-59612-5_2

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free