Frequent itemsets mining is a common and useful task in data mining. But most of the current mining algorithms can't be used in very large text database. In this paper, we propose a novel and efficient parallel algorithm parTFI which is used to find top-k frequent itemsets with specified minimum length in very large text database. Base on a simple data structure H-struct, parTFI uses a novel logical vertical data partition technique to mine top-k frequent itemsets at each mining server parallel. Our performance study shows that when processing very large sparse text database, parTFI outperforms Apriori and FP-growth, two efficient frequent iemsets mining algorithms, even when both are running with the better tuned min_support. Furthermore, by creating Hstruct dynamically, parTFI can suit even huge dataset that most other algorithms can't process. © Springer-Verlag Berlin Heidelberg 2005.
CITATION STYLE
Wang, Y., Jia, Y., & Yang, S. (2005). Parallel mining of top-k frequent itemsets in very large text database. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3739 LNCS, pp. 706–712). Springer Verlag. https://doi.org/10.1007/11563952_68
Mendeley helps you to discover research relevant for your work.