Cluster analysis in document networks

C. K. Dos Santos; A. G. Evsukoff; B. S.L.P. De Lima

Conference ProceedingsOPEN ACCESS

Cluster analysis in document networks

WIT Transactions on Information and Communication Technologies (2008) 40 95-104

DOI: 10.2495/DATA080101

5Citations

6Readers

Abstract

Text or document clustering is a subset of a larger field of data clustering and has been one of the research hotspots in text mining. On the other hand, recent studies have shown that many real systems may be represented as complex networks with astonishing similar proprieties. In this work a document corpora is represented as a complex network of documents, in which the nodes represent the documents and the edges are weighted according to the similarities among documents. The detection of community structures in complex networks can be seen as the cluster analysis in document networks. Recently community detection algorithms based on spectral proprieties of the underlying has shown good results. The main motivation for applying those methods is that they have shown to be robust to the high dimensionality of feature space and also to the inherent data sparsity resulting from text representation in the vector space model. The aim of this paper is to present the application of the community structures algorithms for text mining. Experiments have been carried out on the document clustering problems taken from 20 newsgroup document corpora to evaluate the performance of the proposed approach.

Author supplied keywords

Cite

CITATION STYLE

APA

Dos Santos, C. K., Evsukoff, A. G., & De Lima, B. S. L. P. (2008). Cluster analysis in document networks. In WIT Transactions on Information and Communication Technologies (Vol. 40, pp. 95–104). https://doi.org/10.2495/DATA080101

Cluster analysis in document networks

Abstract

Author supplied keywords

Cite

Register to see more suggestions