Which Is better for frequent pattern mining: Approximate counting or sampling?

1Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We investigate the problem of finding frequent patterns in a continuous stream of transactions. In the literature two prominent approaches are often used: (a) perform approximate counting (e.g., lossy counting algorithm (LCA) of Manku and Motwani, VLDB 2002) by using a lower support threshold than the one given by the user, or (b) maintain a running sample (e.g., reservoir sampling (Algo-Z) of Vitter, TOMS 1985) and generate frequent itemsets from the sample on demand. Both approaches have their advantages and disadvantages. For instance, LCA is known to output all frequent itemsets (recall = 1) but it also outputs many false frequent itemsets (low precision). Sampling is fast, but it outputs a large number of false itemsets as frequent itemsets, particularly when sample size is not large. Although both approaches are known to be practically useful, to the best of our knowledge there has been no comparison between the two approaches. In addition, we propose a novel sampling algorithm (DSS). DSS selects transactions to be included in the sample based on histogram of single itemsets. An empirical comparison study between the 3 algorithms is performed using synthetic and benchmark datasets. Results show that DSS is consistently more accurate than LCA and Algo-Z, whereas LCA performs consistently better than Algo-Z. Furthermore, DSS, although requires more time than Algo-Z, is faster than LCA. © 2009 Springer Berlin Heidelberg.

Cite

CITATION STYLE

APA

Ng, W., & Dash, M. (2009). Which Is better for frequent pattern mining: Approximate counting or sampling? In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5691 LNCS, pp. 151–162). https://doi.org/10.1007/978-3-642-03730-6_13

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free