It is argued that the determination of the best number of clusters k is crucially dependent on the aim of clustering. Existing supposedly “objective” methods of estimating k ignore this. k can be determined by listing a number of requirements for a good clustering in the given application and finding a k that fulfils them all. The approach is illustrated by application to the problem of finding the number of species in a data set of Australasian tetragonula bees. Requirements here include two new statistics formalising the largest within-cluster gap and cluster separation. Due to the typical nature of expert knowledge, it is difficult to make requirements precise, and a number of subjective decisions is involved.
CITATION STYLE
Hennig, C. (2014). How many bee species? a case study in determining the number of clusters. In Studies in Classification, Data Analysis, and Knowledge Organization (Vol. 47, pp. 41–49). Kluwer Academic Publishers. https://doi.org/10.1007/978-3-319-01595-8_5
Mendeley helps you to discover research relevant for your work.