Efficient streaming detection of hidden clusters in big data using subspace stream clustering

0Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Recently, many data mining techniques were revisited to cope with the new big data challenges. Nearly all of these algorithms considered the efficiency of the mining algorithm to survive the increasing size of the data. However, as the dimensionality of the data increases, not only the efficiency but also the effectiveness of traditional mining algorithms is compromised. For instance, clusters hidden in some subspaces are hard to be detected using traditional clustering algorithms, as the dimensionality of the data increases. In this paper, we consider both the huge size, and the high dimensionality of big data by providing a novel solution that presents a three-phase model for subspace stream clustering algorithms. Our novel model, overcomes the huge size of the big data in its first phase, by continuously applying a streaming concept over the huge data objects, and summarizing them into micro-clusters. Then, after each certain batch of data, or after upon a user request, the second phase is applied over the data summarized in micro-clusters, to reconstruct the current distribution of the data out of the current summaries. In the third phase, a subspace clustering algorithm is applied to overcome the high dimensionality of the data, and to find hidden clusters within some subspace. An extensive evaluation study over different scenarios that follow our model over a big data set is performed. © 2014 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Hassani, M., & Seidl, T. (2014). Efficient streaming detection of hidden clusters in big data using subspace stream clustering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8505 LNCS, pp. 146–160). Springer Verlag. https://doi.org/10.1007/978-3-662-43984-5_11

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free