Measures of dispersion and cluster-trees for categorical data

2Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

A clustering algorithm, in essence, is characterized by two features (1) the way in which the heterogeneity within resp. between clusters is measured (objective function) (2) the steps in which the splitting resp. fusioning proceeds. For categorical data there are no "standard indices" formalizing the first aspect. Instead, a number of ad hoc concepts have been used in cluster analysis, labelled "similarity", " information", "impurity" and the like. To clarify matters, we start out from a set of axioms summarizing our conception of "dispersion" for categorical attributes. To no surprise, it turns out, that some well-known measures, including the Gini index and the entropy, qualify as measures of dispersion. We try to indicate, how these measures can be used in unsupervised classification problems as well. Due to its simple analytic form, the Gini index allows for a dispersion-decomposition formula that can be made the starting point for a CART-like cluster tree. Trees are favoured because of i) factor selection and ii) communicability.

Cite

CITATION STYLE

APA

Müller-Funk, U. (2008). Measures of dispersion and cluster-trees for categorical data. In Studies in Classification, Data Analysis, and Knowledge Organization (pp. 163–170). Kluwer Academic Publishers. https://doi.org/10.1007/978-3-540-78246-9_20

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free