A distributed frequent itemsets mining algorithm using sparse boolean matrix on spark

1Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Frequent itemsets mining is one of the most important aspects in data mining for finding interesting knowledge in a huge mass of data. However, traditional frequent itemsets mining algorithms are usually data-intensive and computing-intensive. Take Apriori algorithm, a well-known algorithm in finding frequent itemsets for example, it needs to scan the dataset for many times and with the coming of big data era, it will also cost a lot of time over GB-level data. In order to solve those problems, researchers have made great efforts to improve Apriori algorithm based on distributed computing framework Hadoop or Spark. However, the existing parallel Apriori algorithms based on Hadoop or Spark are not efficient enough over GB-level data. In this paper, we proposed a distributed frequent itemsets mining algorithm by sparse boolean matrix on Spark (FISM). And experiments show FISM has better performance than all others existing parallel frequent itemsets mining algorithms and can also deal with GB-level data.

Cite

CITATION STYLE

APA

Luo, Y., Yang, Z., Shi, H., & Zhang, Y. (2016). A distributed frequent itemsets mining algorithm using sparse boolean matrix on spark. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9932 LNCS, pp. 419–423). Springer Verlag. https://doi.org/10.1007/978-3-319-45817-5_38

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free