Abstract
In this paper the clustering algorith ms: average linkage, ROCK, k-modes, fuzzy k-modes and k-populations were co mpared by means of Monte Carlo simulat ion. Data were simu lated fro m Beta and Uniform distributions considering factors such as clusters overlapping, number of groups, variables and categories. A total of 64 population structures of clusters were simu lated considering smaller and higher degree of overlapping, nu mber o f clusters, variables and categories. The results showed that overlapping was the factor with major impact in the algorith m's accuracy which decreases as the number of clusters increases. In general, ROCK presented the best performance considering overlapping and non-overlapping cases followed by k-modes and fuzzy k-Modes. The k-populations algorithm showed better accuracy only in cases where there was a s mall degree of overlapping with performance similar to the average linkage. The superiority of k-populations algorithm over k-modes and fu zzy k-modes presented in previous studies, which were based only in benchmark data, was not confirmed in this simulat ion study.
Cite
CITATION STYLE
A. Mingoti, S., & A. Matos, R. (2012). Clustering Algorithms for Categorical Data: A Monte Carlo Study. International Journal of Statistics and Applications, 2(4), 24–32. https://doi.org/10.5923/j.statistics.20120204.01
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.