Constraint-based clustering in large databases

Anthony K.H. Tung; Jiawei Han; Laks V.S. Lakshmanan; Raymond T. Ng

Conference Proceedings

Constraint-based clustering in large databases

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2001) 1973 405-419

DOI: 10.1007/3-540-44503-x_26

96Citations

50Readers

Get full text

Abstract

Constrained clustering-finding clusters that satisfy userspecified constraints|is highly desirable in many applications. In this paper, we introduce the constrained clustering problem and show that traditional clustering algorithms (e.g., k-means) cannot handle it. A scalable constraint-clustering algorithm is developed in this study which starts by finding an initial solution that satisfies user-specified constraints and then refines the solution by performing confined object movements under constraints. Our algorithm consists of two phases: pivot movement and deadlock resolution. For both phases, we show that finding the optimal solution is NP-hard. We then propose several heuristics and show how our algorithm can scale up for large data sets using the heuristic of micro-cluster sharing. By experiments, we show the effectiveness and effciency of the heuristics.

Cite

CITATION STYLE

APA

Tung, A. K. H., Han, J., Lakshmanan, L. V. S., & Ng, R. T. (2001). Constraint-based clustering in large databases. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1973, pp. 405–419). Springer Verlag. https://doi.org/10.1007/3-540-44503-x_26

Constraint-based clustering in large databases

Abstract

Cite

Register to see more suggestions