An efficient k-modes algorithm for clustering categorical datasets

17Citations
Citations of this article
47Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Mining clusters from data is an important endeavor in many applications. The k-means method is a popular, efficient, and distribution-free approach for clustering numerical-valued data, but does not apply for categorical-valued observations. The k-modes method addresses this lacuna by replacing the Euclidean with the Hamming distance and the means with the modes in the k-means objective function. We provide a novel, computationally efficient implementation of k-modes, called Optimal Transfer Quick Transfer (OTQT). We prove that OTQT finds updates to improve the objective function that are undetectable to existing k-modes algorithms. Although slightly slower per iteration due to algorithmic complexity, OTQT is always more accurate and almost always faster (and only barely slower on some datasets) to the final optimum. Thus, we recommend OTQT as the preferred, default algorithm for k-modes optimization.

Cite

CITATION STYLE

APA

Dorman, K. S., & Maitra, R. (2022). An efficient k-modes algorithm for clustering categorical datasets. Statistical Analysis and Data Mining, 15(1), 83–97. https://doi.org/10.1002/sam.11546

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free