A precise distance metric for mixed data clustering using chi-square statistics

4Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.

Abstract

In today's scenario, data is available as a mix of numerical and categorical values. Traditional data clustering algorithms perform well for numerical data but produce poor clustering results for mixed data. For better partitioning, the distance metric used should be capable of discriminating the data points with mixed attributes. The distance measure should appropriately balance the categorical distance as well as numerical distance. In this study we have proposed a chi square based statistical approach to determine the weight of the attributes. This weight vector is used to derive the distance matrix of the mixed dataset. The distance matrix is used to cluster the data points using the traditional clustering algorithms. Experiments have been carried out using the UCI benchmark datasets, heart, credit and vote. Apart from these data sets we have also tested our proposed method using a real time bank data set. The accuracy of the clustering results obtained are better than those of the existing works.

Cite

CITATION STYLE

APA

Mohanavalli, S., & Jaisakthi, S. M. (2015). A precise distance metric for mixed data clustering using chi-square statistics. Research Journal of Applied Sciences, Engineering and Technology, 10(12), 1441–1444. https://doi.org/10.19026/rjaset.10.1846

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free