Semi-supervised document clustering with simultaneous text representation and categorization

Yanhua Chen; Lijun Wang; Ming Dong

Conference Proceedings

Semi-supervised document clustering with simultaneous text representation and categorization

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2009) 5781 LNAI(PART 1) 211-226

DOI: 10.1007/978-3-642-04180-8_31

9Citations

13Readers

Get full text

Abstract

In order to derive high quality information from text, the field of text mining has advanced swiftly from simple document clustering to co-clustering with words and categories. However, document co-clustering without any prior knowledge or background information is a challenging problem. In this paper, we propose a Semi-Supervised Non-negative Matrix Factorization (SS-NMF) framework for document co-clustering. Our method computes new word-document and document-category matrices by incorporating user provided constraints through simultaneous distance metric learning and modality selection. Using an iterative algorithm, we perform tri-factorization of the new matrices to infer the document, category and word clusters. Theoretically, we show the convergence and correctness of SS-NMF co-clustering and the advantages of SS-NMF co-clustering over existing approaches. Through extensive experiments conducted on publicly available data sets, we demonstrate the superior performance of SS-NMF for document co-clustering. © 2009 Springer.

Author supplied keywords

Cite

CITATION STYLE

APA

Chen, Y., Wang, L., & Dong, M. (2009). Semi-supervised document clustering with simultaneous text representation and categorization. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5781 LNAI, pp. 211–226). https://doi.org/10.1007/978-3-642-04180-8_31

Semi-supervised document clustering with simultaneous text representation and categorization

Abstract

Author supplied keywords

Cite

Register to see more suggestions