A distributed frequent itemsets mining algorithm using sparse boolean matrix on spark

Yonghong Luo; Zhifan Yang; Huike Shi; Ying Zhang

Conference Proceedings

A distributed frequent itemsets mining algorithm using sparse boolean matrix on spark

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016) 9932 LNCS 419-423

DOI: 10.1007/978-3-319-45817-5_38

1Citations

4Readers

Get full text

Abstract

Frequent itemsets mining is one of the most important aspects in data mining for finding interesting knowledge in a huge mass of data. However, traditional frequent itemsets mining algorithms are usually data-intensive and computing-intensive. Take Apriori algorithm, a well-known algorithm in finding frequent itemsets for example, it needs to scan the dataset for many times and with the coming of big data era, it will also cost a lot of time over GB-level data. In order to solve those problems, researchers have made great efforts to improve Apriori algorithm based on distributed computing framework Hadoop or Spark. However, the existing parallel Apriori algorithms based on Hadoop or Spark are not efficient enough over GB-level data. In this paper, we proposed a distributed frequent itemsets mining algorithm by sparse boolean matrix on Spark (FISM). And experiments show FISM has better performance than all others existing parallel frequent itemsets mining algorithms and can also deal with GB-level data.

Author supplied keywords

Cite

CITATION STYLE

APA

Luo, Y., Yang, Z., Shi, H., & Zhang, Y. (2016). A distributed frequent itemsets mining algorithm using sparse boolean matrix on spark. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9932 LNCS, pp. 419–423). Springer Verlag. https://doi.org/10.1007/978-3-319-45817-5_38

A distributed frequent itemsets mining algorithm using sparse boolean matrix on spark

Abstract

Author supplied keywords

Cite

Register to see more suggestions