A Comparative Study To Find A Suitable Method For Text Document Clustering

  • Punitha S
  • Punithavalli M
N/ACitations
Citations of this article
8Readers
Mendeley users who have this article in their library.

Abstract

Text mining is used in various text related tasks such as information extraction, concept/entity extraction, document summarization, entity relation modeling (i.e., learning relations between named entities), categorization/classification and clustering. This paper focuses on document clustering, a field of text mining, which groups a set of documents into a list of meaningful categories. The main focus of this paper is to present a performance analysis of various techniques available for document clustering. The results of this comparative study can be used to improve existing text data mining frameworks and improve the way of knowledge discovery. This paper considers six clustering techniques for document clustering. The techniques are grouped into three groups namely Group 1 - K-means and its variants (traditional K-means and K* Means algorithms), Group 2 - Expectation Maximization and its variants (traditional EM, Spherical Gaussian EM algorithm and Linear Partitioning and Reallocation clustering (LPR) using EM algorithms), Group 3 - Semantic-based techniques (Hybrid method and Feature-based algorithms). A total of seven algorithms are considered and were selected based on their popularity in the text mining field. Several experiments were conducted to analyze the performance of the algorithm and to select the winner in terms of cluster purity, clustering accuracy and speed of clustering. KEYWORDS

Cite

CITATION STYLE

APA

Punitha, S. C., & Punithavalli, M. (2011). A Comparative Study To Find A Suitable Method For Text Document Clustering. International Journal of Computer Science and Information Technology, 3(6), 49–59. https://doi.org/10.5121/ijcsit.2011.3604

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free