Hierarchical clustering algorithm for categorical data using a probabilistic rough set model

  • Li M
  • Deng S
  • Wang L
 et al. 
  • 19


    Mendeley users who have this article in their library.
  • 16


    Citations of this article.


Several clustering analysis techniques for categorical data exist to divide similar objects into groups. Some are able to handle uncertainty in the clustering process, whereas others have stability issues. In this paper, we propose a new technique called TMDP (Total Mean Distribution Precision) for selecting the partitioning attribute based on probabilistic rough set theory. On the basis of this technique, with the concept of granularity, we derive a new clustering algorithm, MTMDP (Maximum Total Mean Distribution Precision), for categorical data. The MTMDP algorithm is a robust clustering algorithm that handles uncertainty in the process of clustering categorical data. We compare the MTMDP algorithm with the MMR (Min-Min-Roughness) algorithm which is the most relevant clustering algorithm, and also compared it with other unstable clustering algorithms, such as k-modes, fuzzy k-modes and fuzzy centroids. The experimental results indicate that the MTMDP algorithm can be successfully used to analyze grouped categorical data because it produces better clustering results. © 2014 Elsevier B.V. All rights reserved.

Author-supplied keywords

  • Approximation accuracy
  • Categorical data
  • Cluster analysis
  • Distribution approximation precision
  • Probabilistic rough sets

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document


  • Min Li

  • Shaobo Deng

  • Lei Wang

  • Shengzhong Feng

  • Jianping Fan

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free