Abstract
Text or document clustering is a subset of a larger field of data clustering and has been one of the research hotspots in text mining. On the other hand, recent studies have shown that many real systems may be represented as complex networks with astonishing similar proprieties. In this work a document corpora is represented as a complex network of documents, in which the nodes represent the documents and the edges are weighted according to the similarities among documents. The detection of community structures in complex networks can be seen as the cluster analysis in document networks. Recently community detection algorithms based on spectral proprieties of the underlying has shown good results. The main motivation for applying those methods is that they have shown to be robust to the high dimensionality of feature space and also to the inherent data sparsity resulting from text representation in the vector space model. The aim of this paper is to present the application of the community structures algorithms for text mining. Experiments have been carried out on the document clustering problems taken from 20 newsgroup document corpora to evaluate the performance of the proposed approach.
Author supplied keywords
Cite
CITATION STYLE
Dos Santos, C. K., Evsukoff, A. G., & De Lima, B. S. L. P. (2008). Cluster analysis in document networks. In WIT Transactions on Information and Communication Technologies (Vol. 40, pp. 95–104). https://doi.org/10.2495/DATA080101
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.