Mining top-k largest tiles in a data stream

3Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Large tiles in a database are itemsets with the largest area which is defined as the itemset frequency in the database multiplied by its size. Mining these large tiles is an important pattern mining problem since tiles with a large area describe a large part of the database. In this paper, we introduce the problem of mining top-k largest tiles in a data stream under the sliding window model. We propose a candidate-based approach which summarizes the data stream and produces the top-k largest tiles efficiently for moderate window size. We also propose an approximation algorithm with theoretical bounds on the error rate to cope with large size windows. In the experiments with two real-life datasets, the approximation algorithm is up to hundred times faster than the candidate-based solution and the baseline algorithms based on the state-of-the-art solutions. We also investigate an application of large tile mining in computer vision and in emerging search topics monitoring. © 2014 Springer-Verlag.

Cite

CITATION STYLE

APA

Lam, H. T., Pei, W., Prado, A., Jeudy, B., & Fromont, É. (2014). Mining top-k largest tiles in a data stream. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8725 LNAI, pp. 82–97). Springer Verlag. https://doi.org/10.1007/978-3-662-44851-9_6

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free