Spark has been established as an attractive platform for big data analysis, since it manages to hide most of the complexities related to parallelism, fault tolerance and cluster setting from developers. However, this comes at the expense of having over 150 configurable parameters, the impact of which cannot be exhaustively examined due to the exponential amount of their combinations. In this work, we investigate the impact of the most important of the tunable Spark parameters on the application performance and guide developers on how to proceed to changes to the default values. We conduct a series of experiments and we offer a trialand- error methodology for tuning parameters in arbitrary applications based on evidence from a very small number of experimental runs. We test our methodology in three case studies, where we manage to achieve speedups of more than 10 times.
CITATION STYLE
Petridis, P., Gounaris, A., & Torres, J. (2017). Spark parameter tuning via trial-and-error. In Advances in Intelligent Systems and Computing (Vol. 529, pp. 226–237). Springer Verlag. https://doi.org/10.1007/978-3-319-47898-2_24
Mendeley helps you to discover research relevant for your work.