Parallel mining of top-k frequent itemsets in very large text database

Yongheng Wang; Yan Jia; Shuqiang Yang

Conference Proceedings

Parallel mining of top-k frequent itemsets in very large text database

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2005) 3739 LNCS 706-712

DOI: 10.1007/11563952_68

2Citations

8Readers

Get full text

Abstract

Frequent itemsets mining is a common and useful task in data mining. But most of the current mining algorithms can't be used in very large text database. In this paper, we propose a novel and efficient parallel algorithm parTFI which is used to find top-k frequent itemsets with specified minimum length in very large text database. Base on a simple data structure H-struct, parTFI uses a novel logical vertical data partition technique to mine top-k frequent itemsets at each mining server parallel. Our performance study shows that when processing very large sparse text database, parTFI outperforms Apriori and FP-growth, two efficient frequent iemsets mining algorithms, even when both are running with the better tuned min_support. Furthermore, by creating Hstruct dynamically, parTFI can suit even huge dataset that most other algorithms can't process. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Wang, Y., Jia, Y., & Yang, S. (2005). Parallel mining of top-k frequent itemsets in very large text database. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3739 LNCS, pp. 706–712). Springer Verlag. https://doi.org/10.1007/11563952_68

Parallel mining of top-k frequent itemsets in very large text database

Abstract

Cite

Register to see more suggestions