A statistics–based semantic relation analysis approach for document clustering

Xin Cheng; Duo Qian Miao; Lei Wang

Conference Proceedings

A statistics–based semantic relation analysis approach for document clustering

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2014) 8818 332-342

DOI: 10.1007/978-3-319-11740-9_31

0Citations

3Readers

Get full text

Abstract

Document clustering is a widely research topic in the area of machine learning. A number of approaches have been proposed to represent and cluster documents. One of the recent trends in document clustering research is to incorporate the semantic information into document representation. In this paper, we introduce a novel technique for capturing the robust and reliable semantic information from term-term co-occurrence statistics. Firstly, we propose a novel method to evaluate the explicit semantic relation between terms from their cooccurrence information. Then the underlying semantic relation between terms is also captured by their interaction with other terms. Lastly, these two complementary semantic relations are integrated together to capture the complete semantic information from the original documents. Experimental results show that clustering performance improves significantly by enriching document representation with the semantic information.

Cite

CITATION STYLE

APA

Cheng, X., Miao, D. Q., & Wang, L. (2014). A statistics–based semantic relation analysis approach for document clustering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8818, pp. 332–342). Springer Verlag. https://doi.org/10.1007/978-3-319-11740-9_31

A statistics–based semantic relation analysis approach for document clustering

Abstract

Cite

Register to see more suggestions