Fast vertical mining using diffsets

  • Zaki M
  • Gouda K
  • 88

    Readers

    Mendeley users who have this article in their library.
  • 294

    Citations

    Citations of this article.

Abstract

A number of vertical mining algorithms have been proposed recently for association mining, which have shown to be very effective and usually outperform horizontal approaches. The main advantage of the vertical format is support for fast frequency counting via intersection operations on transaction ids (tids) and automatic pruning of irrelevant data. The main problem with these approaches is when intermediate results of vertical tid lists become too large for memory, thus affecting the algorithm scalability.In this paper we present a novel vertical data representation called Diffset, that only keeps track of differences in the tids of a candidate pattern from its generating frequent patterns. We show that diffsets drastically cut down the size of memory required to store intermediate results. We show how diffsets, when incorporated into previous vertical mining methods, increase the performance significantly.

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document

Authors

  • Mohammed J. Zaki

  • Karam Gouda

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free