Improving document clustering for short texts by long documents via a dirichlet multinomial allocation model

Yingying Yan; Ruizhang Huang; Can Ma; Liyang Xu; Zhiyuan Ding; Rui Wang; Ting Huang; Bowei Liu

Conference Proceedings

Improving document clustering for short texts by long documents via a dirichlet multinomial allocation model

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2017) 10366 LNCS 626-641

DOI: 10.1007/978-3-319-63579-8_47

7Citations

4Readers

Get full text

Abstract

Document clustering for short texts has received considerable interest. Traditional document clustering approaches are designed for long documents and perform poorly for short texts due to the their sparseness representation. To better understand short texts, we observe that words that appear in long documents can enrich short text context and improve the clustering performance for short texts. In this paper, we propose a novel model, namely DDMAfs, which (1) improves the clustering performance of short texts by sharing structural knowledge of long documents to short texts; (2) automatically identifies the number of clusters; (3) separates discriminative words from irrelevant words for long documents to obtain high quality structural knowledge. Our experiments indicate that the DDMAfs model performs well on the synthetic dataset and real datasets. Comparisons between the DDMAfs model and state-of-the-art short text clustering approaches show that the DDMAfs model is effective.

Author supplied keywords

Cite

CITATION STYLE

APA

Yan, Y., Huang, R., Ma, C., Xu, L., Ding, Z., Wang, R., … Liu, B. (2017). Improving document clustering for short texts by long documents via a dirichlet multinomial allocation model. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10366 LNCS, pp. 626–641). Springer Verlag. https://doi.org/10.1007/978-3-319-63579-8_47

Improving document clustering for short texts by long documents via a dirichlet multinomial allocation model

Abstract

Author supplied keywords

Cite

Register to see more suggestions