Benchmarking for clustering methods based on real data: A statistical view

4Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In analogy to clinical trials, in a benchmark experiment based on real datasets we can see the considered datasets as playing the role of patients and the compared methods as playing the role of treatments. This view of benchmark experiments, which has already been suggested in the literature, brings to light the importance of statistical concepts such as testing, confidence intervals, power calculation, and sampling procedure for the interpretation of benchmarking results. In this paper we propose an application of these concepts to the special case of benchmark experiments comparing clustering algorithms. We present a simple exemplary benchmarking study comparing two classical clustering algorithms based on 50 high-dimensional gene expression datasets and discuss the interpretation of its results from a critical statistical perspective. The R-codes implementing the analyses presented in this paper are freely available from: http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/020_professuren/boulesteix/boulesteixhatz.

Cite

CITATION STYLE

APA

Boulesteix, A. L., & Hatz, M. (2017). Benchmarking for clustering methods based on real data: A statistical view. In Studies in Classification, Data Analysis, and Knowledge Organization (Vol. 0, pp. 73–82). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-319-55723-6_6

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free