A caching-based parallel FP-growth in apache spark

Zhicheng Cai; Xingyu Zhu; Yuehui Zheng; Duan Liu; Lei Xu

Conference Proceedings

A caching-based parallel FP-growth in apache spark

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 11336 LNCS 519-533

DOI: 10.1007/978-3-030-05057-3_39

7Citations

4Readers

Get full text

Abstract

The association-rule-based recommendation is widespread in many big data applications which need quick response to improve user experience. Spark is a widely used distributed computing platform, which accelerates the processing of large-scale distributed data. Developing appropriate distributed algorithm for Spark is essential to decrease the processing time of distributed recommendation. The existing FP-Growth in Spark is a popular parallel recommendation method but getting the best performance only when the memory of machines can accommodate all immediate Resilient Distributed DataSets (RDDs). However, memory of many practice data centers is still not large enough for large data sets. Therefore, in this paper, a caching-based parallel FP-Growth is proposed which consists of an integer-based sorting and an RDD-caching strategy to improve the efficiency. Experimental results show that the proposal decreases the execution time by 32.37% on average compared with the existing parallel FP-Growth in Spark. Furthermore, impacts of some important parameters upon the performance of the proposal are analyzed by numerous realistic experiments in Spark.

Author supplied keywords

Cite

CITATION STYLE

APA

Cai, Z., Zhu, X., Zheng, Y., Liu, D., & Xu, L. (2018). A caching-based parallel FP-growth in apache spark. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11336 LNCS, pp. 519–533). Springer Verlag. https://doi.org/10.1007/978-3-030-05057-3_39

A caching-based parallel FP-growth in apache spark

Abstract

Author supplied keywords

Cite

Register to see more suggestions