Spark parameter tuning via trial-and-error

44Citations
Citations of this article
25Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Spark has been established as an attractive platform for big data analysis, since it manages to hide most of the complexities related to parallelism, fault tolerance and cluster setting from developers. However, this comes at the expense of having over 150 configurable parameters, the impact of which cannot be exhaustively examined due to the exponential amount of their combinations. In this work, we investigate the impact of the most important of the tunable Spark parameters on the application performance and guide developers on how to proceed to changes to the default values. We conduct a series of experiments and we offer a trialand- error methodology for tuning parameters in arbitrary applications based on evidence from a very small number of experimental runs. We test our methodology in three case studies, where we manage to achieve speedups of more than 10 times.

Cite

CITATION STYLE

APA

Petridis, P., Gounaris, A., & Torres, J. (2017). Spark parameter tuning via trial-and-error. In Advances in Intelligent Systems and Computing (Vol. 529, pp. 226–237). Springer Verlag. https://doi.org/10.1007/978-3-319-47898-2_24

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free