Intra-feature random forest clustering

Michael Cohen

Conference Proceedings

Intra-feature random forest clustering

Cohen M

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 10710 LNCS 41-49

DOI: 10.1007/978-3-319-72926-8_4

0Citations

2Readers

Get full text

Abstract

Clustering algorithms are commonly used to find structure in data without explicitly being told what they are looking for. One key desideratum of a clustering algorithm is that the clusters it identifies given some set of features will generalize well to features that have not been measured. Yeung et al. (2001) introduce a Figure of Merit closely aligned to this desideratum, which they use to evaluate clustering algorithms. Broadly, the Figure of Merit measures the within-cluster variance of features of the data that were not available to the clustering algorithm. Using this metric, Yeung et al. found no clustering algorithms that reliably outperformed k-means on a suite of real world datasets (Yeung et al. 2001). This paper presents a novel clustering algorithm, intra-feature random forest clustering (IRFC), that does outperform k-means on a variety of real world datasets per this metric. IRFC begins by training an ensemble of decision trees of limited depth to predict randomly selected features given the remaining features. It then aggregates the partitions that are implied by these trees, and outputs however many clusters are specified by an input parameter.

Author supplied keywords

Cite

CITATION STYLE

APA

Cohen, M. (2018). Intra-feature random forest clustering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10710 LNCS, pp. 41–49). Springer Verlag. https://doi.org/10.1007/978-3-319-72926-8_4

Intra-feature random forest clustering

Abstract

Author supplied keywords

Cite

Register to see more suggestions