MMPClust: A skew prevention algorithm for model-based document clustering

3Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

To support very high dimensionality, model-based clustering is an intuitive choice for document clustering. However, the current model-based algorithms are prone to generating the skewed clusters, which influence the quality of clustering seriously. In this paper, the reasons of skew are examined and determined as the inappropriate initial model, the unfitness of cluster model and the interaction between the decentralization of estimation samples and the over-generalized cluster model. This paper proposes a skew prevention document-clustering algorithm (MMPClust), which has two features: (1) a content-based cluster model is used to model the cluster better; (2) at the re-estimation step, a part of documents most relevant to its corresponding class are selected automatically for each cluster as the estimation samples to break this interaction. MMPClust has less restrictions and more applicability in document clustering than the previous methods. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Li, X., Yu, G., & Wang, D. (2005). MMPClust: A skew prevention algorithm for model-based document clustering. In Lecture Notes in Computer Science (Vol. 3453, pp. 536–547). Springer Verlag. https://doi.org/10.1007/11408079_47

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free