Clustering Algorithms for Categorical Data: A Monte Carlo Study

  • A. Mingoti S
  • A. Matos R
N/ACitations
Citations of this article
21Readers
Mendeley users who have this article in their library.

Abstract

In this paper the clustering algorith ms: average linkage, ROCK, k-modes, fuzzy k-modes and k-populations were co mpared by means of Monte Carlo simulat ion. Data were simu lated fro m Beta and Uniform distributions considering factors such as clusters overlapping, number of groups, variables and categories. A total of 64 population structures of clusters were simu lated considering smaller and higher degree of overlapping, nu mber o f clusters, variables and categories. The results showed that overlapping was the factor with major impact in the algorith m's accuracy which decreases as the number of clusters increases. In general, ROCK presented the best performance considering overlapping and non-overlapping cases followed by k-modes and fuzzy k-Modes. The k-populations algorithm showed better accuracy only in cases where there was a s mall degree of overlapping with performance similar to the average linkage. The superiority of k-populations algorithm over k-modes and fu zzy k-modes presented in previous studies, which were based only in benchmark data, was not confirmed in this simulat ion study.

Cite

CITATION STYLE

APA

A. Mingoti, S., & A. Matos, R. (2012). Clustering Algorithms for Categorical Data: A Monte Carlo Study. International Journal of Statistics and Applications, 2(4), 24–32. https://doi.org/10.5923/j.statistics.20120204.01

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free