Parallel mining of top-k frequent itemsets in very large text database

2Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Frequent itemsets mining is a common and useful task in data mining. But most of the current mining algorithms can't be used in very large text database. In this paper, we propose a novel and efficient parallel algorithm parTFI which is used to find top-k frequent itemsets with specified minimum length in very large text database. Base on a simple data structure H-struct, parTFI uses a novel logical vertical data partition technique to mine top-k frequent itemsets at each mining server parallel. Our performance study shows that when processing very large sparse text database, parTFI outperforms Apriori and FP-growth, two efficient frequent iemsets mining algorithms, even when both are running with the better tuned min_support. Furthermore, by creating Hstruct dynamically, parTFI can suit even huge dataset that most other algorithms can't process. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Wang, Y., Jia, Y., & Yang, S. (2005). Parallel mining of top-k frequent itemsets in very large text database. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3739 LNCS, pp. 706–712). Springer Verlag. https://doi.org/10.1007/11563952_68

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free