A k-mean clustering algorithm for mixed numeric and categorical data

Amir Ahmad; Lipika Dey

Journal Article

A k-mean clustering algorithm for mixed numeric and categorical data

Data and Knowledge Engineering (2007) 63(2) 503-527

DOI: 10.1016/j.datak.2007.03.016

640Citations

515Readers

Get full text

Abstract

Use of traditional k-mean type algorithm is limited to numeric data. This paper presents a clustering algorithm based on k-mean paradigm that works well for data with mixed numeric and categorical features. We propose new cost function and distance measure based on co-occurrence of values. The measures also take into account the significance of an attribute towards the clustering process. We present a modified description of cluster center to overcome the numeric data only limitation of k-mean algorithm and provide a better characterization of clusters. The performance of this algorithm has been studied on real world data sets. Comparisons with other clustering algorithms illustrate the effectiveness of this approach. © 2007 Elsevier B.V. All rights reserved.

Author supplied keywords

Cite

CITATION STYLE

APA

Ahmad, A., & Dey, L. (2007). A k-mean clustering algorithm for mixed numeric and categorical data. Data and Knowledge Engineering, 63(2), 503–527. https://doi.org/10.1016/j.datak.2007.03.016

A k-mean clustering algorithm for mixed numeric and categorical data

Abstract

Author supplied keywords

Cite

Register to see more suggestions