A caching-based parallel FP-growth in apache spark

7Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The association-rule-based recommendation is widespread in many big data applications which need quick response to improve user experience. Spark is a widely used distributed computing platform, which accelerates the processing of large-scale distributed data. Developing appropriate distributed algorithm for Spark is essential to decrease the processing time of distributed recommendation. The existing FP-Growth in Spark is a popular parallel recommendation method but getting the best performance only when the memory of machines can accommodate all immediate Resilient Distributed DataSets (RDDs). However, memory of many practice data centers is still not large enough for large data sets. Therefore, in this paper, a caching-based parallel FP-Growth is proposed which consists of an integer-based sorting and an RDD-caching strategy to improve the efficiency. Experimental results show that the proposal decreases the execution time by 32.37% on average compared with the existing parallel FP-Growth in Spark. Furthermore, impacts of some important parameters upon the performance of the proposal are analyzed by numerous realistic experiments in Spark.

Cite

CITATION STYLE

APA

Cai, Z., Zhu, X., Zheng, Y., Liu, D., & Xu, L. (2018). A caching-based parallel FP-growth in apache spark. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11336 LNCS, pp. 519–533). Springer Verlag. https://doi.org/10.1007/978-3-030-05057-3_39

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free