Abstract
In order to derive high quality information from text, the field of text mining has advanced swiftly from simple document clustering to co-clustering with words and categories. However, document co-clustering without any prior knowledge or background information is a challenging problem. In this paper, we propose a Semi-Supervised Non-negative Matrix Factorization (SS-NMF) framework for document co-clustering. Our method computes new word-document and document-category matrices by incorporating user provided constraints through simultaneous distance metric learning and modality selection. Using an iterative algorithm, we perform tri-factorization of the new matrices to infer the document, category and word clusters. Theoretically, we show the convergence and correctness of SS-NMF co-clustering and the advantages of SS-NMF co-clustering over existing approaches. Through extensive experiments conducted on publicly available data sets, we demonstrate the superior performance of SS-NMF for document co-clustering. © 2009 Springer.
Author supplied keywords
Cite
CITATION STYLE
Chen, Y., Wang, L., & Dong, M. (2009). Semi-supervised document clustering with simultaneous text representation and categorization. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5781 LNAI, pp. 211–226). https://doi.org/10.1007/978-3-642-04180-8_31
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.