Benchmarking for clustering methods based on real data: A statistical view

Anne Laure Boulesteix; Myriam Hatz

Conference Proceedings

Benchmarking for clustering methods based on real data: A statistical view

Studies in Classification, Data Analysis, and Knowledge Organization (2017) 0 73-82

DOI: 10.1007/978-3-319-55723-6_6

4Citations

3Readers

Get full text

Abstract

In analogy to clinical trials, in a benchmark experiment based on real datasets we can see the considered datasets as playing the role of patients and the compared methods as playing the role of treatments. This view of benchmark experiments, which has already been suggested in the literature, brings to light the importance of statistical concepts such as testing, confidence intervals, power calculation, and sampling procedure for the interpretation of benchmarking results. In this paper we propose an application of these concepts to the special case of benchmark experiments comparing clustering algorithms. We present a simple exemplary benchmarking study comparing two classical clustering algorithms based on 50 high-dimensional gene expression datasets and discuss the interpretation of its results from a critical statistical perspective. The R-codes implementing the analyses presented in this paper are freely available from: http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/020_professuren/boulesteix/boulesteixhatz.

Cite

CITATION STYLE

APA

Boulesteix, A. L., & Hatz, M. (2017). Benchmarking for clustering methods based on real data: A statistical view. In Studies in Classification, Data Analysis, and Knowledge Organization (Vol. 0, pp. 73–82). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-319-55723-6_6

Benchmarking for clustering methods based on real data: A statistical view

Abstract

Cite

Register to see more suggestions