A novel algorithm for fast and scalable subspace clustering of high-dimensional data

35Citations
Citations of this article
58Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Rapid growth of high dimensional datasets in recent years has created an emergent need to extract the knowledge underlying them. Clustering is the process of automatically finding groups of similar data points in the space of the dimensions or attributes of a dataset. Finding clusters in the high dimensional datasets is an important and challenging data mining problem. Data group together differently under different subsets of dimensions, called subspaces. Quite often a dataset can be better understood by clustering it in its subspaces, a process called subspace clustering. But the exponential growth in the number of these subspaces with the dimensionality of data makes the whole process of subspace clustering computationally very expensive. There is a growing demand for efficient and scalable subspace clustering solutions in many Big data application domains like biology, computer vision, astronomy and social networking. Apriori based hierarchical clustering is a promising approach to find all possible higher dimensional subspace clusters from the lower dimensional clusters using a bottom-up process. However, the performance of the existing algorithms based on this approach deteriorates drastically with the increase in the number of dimensions. Most of these algorithms require multiple database scans and generate a large number of redundant subspace clusters, either implicitly or explicitly, during the clustering process. In this paper, we present SUBSCALE, a novel clustering algorithm to find non-trivial subspace clusters with minimal cost and it requires only k database scans for a k-dimensional data set. Our algorithm scales very well with the dimensionality of the dataset and is highly parallelizable. We present the details of the SUBSCALE algorithm and its evaluation in this paper.

Cite

CITATION STYLE

APA

Kaur, A., & Datta, A. (2015). A novel algorithm for fast and scalable subspace clustering of high-dimensional data. Journal of Big Data, 2(1). https://doi.org/10.1186/s40537-015-0027-y

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free