Tidset-based parallel FP-tree algorithm for the frequent pattern mining problem on PC clusters

34Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Mining association rules from a transaction-oriented database is a problem in data mining. Frequent patterns are essential for generating associa-tion rules, time series analysis, classification, etc. There are two categories of algorithms for data mining, the generate-and-test approach (Apriori-like) and the pattern growth approach (FP-tree). Recently, many methods have been proposed for solving this problem based on an FP-tree as a replacement for Apriori-like algorithms, because these need to scan the database many times. However, even for the pattern growth method, the execution time takes long when the database is large or the given support is low. Parallel- distributed computing is good strategy for solving this problem. Some parallel algorithms have been proposed, however, the execution time increases rapidly when the database increases or when the given minimum threshold is small. In this study, an efficient parallel- distributed mining algorithm based on an FP-tree structure - the Tidset-based Parallel FP-tree (TPFP-tree) - is proposed. In order to exchange transactions efficiently, transaction identification set (Tidset) was used to directly choose transactions without scanning databases. The algorithm was verified on a Linux cluster with 16 computing nodes. It was also compared with a PFP-tree algorithm. The dataset generated by IBM's Quest Synthetic Data Generator to verify the performance of algorithms was used. The experimental results showed that this algorithm can reduce the execution time when the database grows. Moreover, it was also observed that this algorithm had better scalability than the PFP-tree. © 2008 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Zhou, J., & Yu, K. M. (2008). Tidset-based parallel FP-tree algorithm for the frequent pattern mining problem on PC clusters. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5036 LNCS, pp. 18–28). https://doi.org/10.1007/978-3-540-68083-3_5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free