MMPClust: A skew prevention algorithm for model-based document clustering

Xiaoguang Li; Ge Yu; Daling Wang

Conference Proceedings

MMPClust: A skew prevention algorithm for model-based document clustering

Lecture Notes in Computer Science (2005) 3453 536-547

DOI: 10.1007/11408079_47

3Citations

3Readers

Get full text

Abstract

To support very high dimensionality, model-based clustering is an intuitive choice for document clustering. However, the current model-based algorithms are prone to generating the skewed clusters, which influence the quality of clustering seriously. In this paper, the reasons of skew are examined and determined as the inappropriate initial model, the unfitness of cluster model and the interaction between the decentralization of estimation samples and the over-generalized cluster model. This paper proposes a skew prevention document-clustering algorithm (MMPClust), which has two features: (1) a content-based cluster model is used to model the cluster better; (2) at the re-estimation step, a part of documents most relevant to its corresponding class are selected automatically for each cluster as the estimation samples to break this interaction. MMPClust has less restrictions and more applicability in document clustering than the previous methods. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Li, X., Yu, G., & Wang, D. (2005). MMPClust: A skew prevention algorithm for model-based document clustering. In Lecture Notes in Computer Science (Vol. 3453, pp. 536–547). Springer Verlag. https://doi.org/10.1007/11408079_47

MMPClust: A skew prevention algorithm for model-based document clustering

Abstract

Cite

Register to see more suggestions