Faster cloud Star Joins with reduced disk spill and network communication

Jaqueline Joice Brito; Thiago Mosqueiro; Ricardo Rodrigues Ciferri; Cristina Dutra De Aguiar Ciferri

Conference ProceedingsOPEN ACCESS

Faster cloud Star Joins with reduced disk spill and network communication

Procedia Computer Science (2016) 80 74-85

DOI: 10.1016/j.procs.2016.05.299

10Citations

26Readers

Abstract

Combining powerful parallel frameworks and on-demand commodity hardware, cloud computing has made both analytics and decision support systems canonical to enterprises of all sizes. Associated with unprecedented volumes of data stacked by such companies, ltering and retrieving them are pressing challenges. This data is often organized in star schemas, in which Star Joins are ubiquitous and expensive operations. In particular, excessive disk spill and network communication are tight bottlenecks for all current MapReduce or Spark solutions. Here, we propose two e cient solutions that drop the computation time by at least 60%: the Spark Bloom-Filtered Cascade Join (SBFCJ) and the Spark Broadcast Join (SBJ). Conversely a direct Spark implementation of a sequence of joins renders poor performance, showcasing the importance of further ltering for minimal disk spill and network communication. Finally while SBJ is twice faster when memory per executor is large enough, SBFCJ is remarkably resilient to low memory scenarios. Both algorithms pose very competitive solutions to Star Joins in the cloud.

Author supplied keywords

Cite

CITATION STYLE

APA

Brito, J. J., Mosqueiro, T., Ciferri, R. R., & De Aguiar Ciferri, C. D. (2016). Faster cloud Star Joins with reduced disk spill and network communication. In Procedia Computer Science (Vol. 80, pp. 74–85). Elsevier B.V. https://doi.org/10.1016/j.procs.2016.05.299

Faster cloud Star Joins with reduced disk spill and network communication

Abstract

Author supplied keywords

Cite

Register to see more suggestions