In analogy to clinical trials, in a benchmark experiment based on real datasets we can see the considered datasets as playing the role of patients and the compared methods as playing the role of treatments. This view of benchmark experiments, which has already been suggested in the literature, brings to light the importance of statistical concepts such as testing, confidence intervals, power calculation, and sampling procedure for the interpretation of benchmarking results. In this paper we propose an application of these concepts to the special case of benchmark experiments comparing clustering algorithms. We present a simple exemplary benchmarking study comparing two classical clustering algorithms based on 50 high-dimensional gene expression datasets and discuss the interpretation of its results from a critical statistical perspective. The R-codes implementing the analyses presented in this paper are freely available from: http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/020_professuren/boulesteix/boulesteixhatz.
CITATION STYLE
Boulesteix, A. L., & Hatz, M. (2017). Benchmarking for clustering methods based on real data: A statistical view. In Studies in Classification, Data Analysis, and Knowledge Organization (Vol. 0, pp. 73–82). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-319-55723-6_6
Mendeley helps you to discover research relevant for your work.