High-order co-clustering text data on semantics-based representation model

Liping Jing; Jiali Yun; Jian Yu; Joshua Huang

Conference Proceedings

High-order co-clustering text data on semantics-based representation model

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2011) 6634 LNAI(PART 1) 171-182

DOI: 10.1007/978-3-642-20841-6_15

12Citations

7Readers

Get full text

Abstract

The language modeling approach is widely used to improve the performance of text mining in recent years because of its solid theoretical foundation and empirical effectiveness. In essence, this approach centers on the issue of estimating an accurate model by choosing appropriate language models as well as smooth techniques. Semantic smoothing, which incorporates semantic and contextual information into the language models, is effective and potentially significant to improve the performance of text mining. In this paper, we proposed a high-order structure to represent text data by incorporating background knowledge, Wikipedia. The proposed structure consists of three types of objects, term, document and concept. Moreover, we firstly combined the high-order co-clustering algorithm with the proposed model to simultaneously cluster documents, terms and concepts. Experimental results on benchmark data sets (20Newsgroups and Reuters-21578) have shown that our proposed high-order co-clustering on high-order structure outperforms the general co-clustering algorithm on bipartite text data, such as document-term, document-concept and document-(term+concept). © 2011 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Jing, L., Yun, J., Yu, J., & Huang, J. (2011). High-order co-clustering text data on semantics-based representation model. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6634 LNAI, pp. 171–182). Springer Verlag. https://doi.org/10.1007/978-3-642-20841-6_15

High-order co-clustering text data on semantics-based representation model

Abstract

Author supplied keywords

Cite

Register to see more suggestions