Subspace metric ensembles for semi-supervised clustering of high dimensional data

Bojun Yan; Carlotta Domeniconi

Conference ProceedingsOPEN ACCESS

Subspace metric ensembles for semi-supervised clustering of high dimensional data

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2006) 4212 LNAI 509-520

DOI: 10.1007/11871842_48

8Citations

11Readers

Abstract

A critical problem in clustering research is the definition of a proper metric to measure distances between points. Semi-supervised clustering uses the information provided by the user, usually defined in terms of constraints, to guide the search of clusters. Learning effective metrics using constraints in high dimensional spaces remains an open challenge. This is because the number of parameters to be estimated is quadratic in the number of dimensions, and we seldom have enough side-information to achieve accurate estimates. In this paper, we address the high dimensionality problem by learning an ensemble of subspace metrics. This is achieved by projecting the data and the constraints in multiple subspaces, and by learning positive semi-definite similarity matrices therein. This methodology allows leveraging the given side-information while solving lower dimensional problems. We demonstrate experimentally using high dimensional data (e.g., microarray data) the superior accuracy achieved by our method with respect to competitive approaches. © Springer-Verlag Berlin Heidelberg 2006.

Cite

CITATION STYLE

APA

Yan, B., & Domeniconi, C. (2006). Subspace metric ensembles for semi-supervised clustering of high dimensional data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4212 LNAI, pp. 509–520). Springer Verlag. https://doi.org/10.1007/11871842_48

Subspace metric ensembles for semi-supervised clustering of high dimensional data

Abstract

Cite

Register to see more suggestions