A precise distance metric for mixed data clustering using chi-square statistics

S. Mohanavalli; S. M. Jaisakthi

Journal ArticleOPEN ACCESS

A precise distance metric for mixed data clustering using chi-square statistics

Research Journal of Applied Sciences, Engineering and Technology (2015) 10(12) 1441-1444

DOI: 10.19026/rjaset.10.1846

4Citations

6Readers

Abstract

In today's scenario, data is available as a mix of numerical and categorical values. Traditional data clustering algorithms perform well for numerical data but produce poor clustering results for mixed data. For better partitioning, the distance metric used should be capable of discriminating the data points with mixed attributes. The distance measure should appropriately balance the categorical distance as well as numerical distance. In this study we have proposed a chi square based statistical approach to determine the weight of the attributes. This weight vector is used to derive the distance matrix of the mixed dataset. The distance matrix is used to cluster the data points using the traditional clustering algorithms. Experiments have been carried out using the UCI benchmark datasets, heart, credit and vote. Apart from these data sets we have also tested our proposed method using a real time bank data set. The accuracy of the clustering results obtained are better than those of the existing works.

Author supplied keywords

Cite

CITATION STYLE

APA

Mohanavalli, S., & Jaisakthi, S. M. (2015). A precise distance metric for mixed data clustering using chi-square statistics. Research Journal of Applied Sciences, Engineering and Technology, 10(12), 1441–1444. https://doi.org/10.19026/rjaset.10.1846

A precise distance metric for mixed data clustering using chi-square statistics

Abstract

Author supplied keywords

Cite

Register to see more suggestions