A k-mean clustering algorithm for mixed numeric and categorical data

640Citations
Citations of this article
515Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Use of traditional k-mean type algorithm is limited to numeric data. This paper presents a clustering algorithm based on k-mean paradigm that works well for data with mixed numeric and categorical features. We propose new cost function and distance measure based on co-occurrence of values. The measures also take into account the significance of an attribute towards the clustering process. We present a modified description of cluster center to overcome the numeric data only limitation of k-mean algorithm and provide a better characterization of clusters. The performance of this algorithm has been studied on real world data sets. Comparisons with other clustering algorithms illustrate the effectiveness of this approach. © 2007 Elsevier B.V. All rights reserved.

Cite

CITATION STYLE

APA

Ahmad, A., & Dey, L. (2007). A k-mean clustering algorithm for mixed numeric and categorical data. Data and Knowledge Engineering, 63(2), 503–527. https://doi.org/10.1016/j.datak.2007.03.016

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free