Constraint selection by committee: An ensemble approach to identifying informative constraints for semi-supervised clustering

Derek Greene; Pádraig Cunningham

Conference ProceedingsOPEN ACCESS

Constraint selection by committee: An ensemble approach to identifying informative constraints for semi-supervised clustering

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2007) 4701 LNAI 140-151

DOI: 10.1007/978-3-540-74958-5_16

38Citations

35Readers

Abstract

A number of clustering algorithms have been proposed for use in tasks where a limited degree of supervision is available. This prior knowledge is frequently provided in the form of pairwise must-link and cannot-link constraints. While the incorporation of pairwise supervision has the potential to improve clustering accuracy, the composition and cardinality of the constraint sets can significantly impact upon the level of improvement. We demonstrate that it is often possible to correctly "guess" a large number of constraints without supervision from the co-associations between pairs of objects in an ensemble of clusterings. Along the same lines, we establish that constraints based on pairs with uncertain co-associations are particularly informative, if known. An evaluation on text data shows that this provides an effective criterion for identifying constraints, leading to a reduction in the level of supervision required to direct a clustering algorithm to an accurate solution. © Springer-Verlag Berlin Heidelberg 2007.

Cite

CITATION STYLE

APA

Greene, D., & Cunningham, P. (2007). Constraint selection by committee: An ensemble approach to identifying informative constraints for semi-supervised clustering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4701 LNAI, pp. 140–151). Springer Verlag. https://doi.org/10.1007/978-3-540-74958-5_16

Constraint selection by committee: An ensemble approach to identifying informative constraints for semi-supervised clustering

Abstract

Cite

Register to see more suggestions