In the unsupervised learning Clustering is the task to find hidden structure without any prior knowledge of data and derive the interesting patterns from the given data objects. Furthermost the real word dataset is the combination of numerical and categorical data attributes. The K-prototype Clustering algorithm is widely used to group the mixed data because of ease of implementation. The efficiency of the algorithm depends on the selection strategy of initial centroids, and here the initial centroids are randomly selected. Other constraint of this algorithm is to provide number of clusters as input, which requires the domain specific knowledge. Inappropriate choice for number of clusters will affect the complexity of algorithm. In this paper the REDIC (Removal Dependency on K and Initial Centroid Selection) K-prototype clustering algorithm is proposed which will eliminate the dependency on input parameter and creates the cluster using incremental approach. Here as a replacement for the bit by bit comparison of categorical attributes, the frequency-based method is used to calculate the dissimilarity measurement between two categorical instances. Experiments are conducted with standard datasets and the results are compared with traditional K-prototype algorithm. The better results of REDIC K -prototypes clustering algorithm proves the efficiency of algorithm and removes the dependency on initial parameter selection.
CITATION STYLE
Nirmal, K. R., & Satyanarayana, K. V. V. (2019). REDIC K–prototype clustering algorithm for mixed data (Numerical and categorical data). International Journal of Recent Technology and Engineering, 7(6), 1–6.
Mendeley helps you to discover research relevant for your work.