Improving document clustering for short texts by long documents via a dirichlet multinomial allocation model

7Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Document clustering for short texts has received considerable interest. Traditional document clustering approaches are designed for long documents and perform poorly for short texts due to the their sparseness representation. To better understand short texts, we observe that words that appear in long documents can enrich short text context and improve the clustering performance for short texts. In this paper, we propose a novel model, namely DDMAfs, which (1) improves the clustering performance of short texts by sharing structural knowledge of long documents to short texts; (2) automatically identifies the number of clusters; (3) separates discriminative words from irrelevant words for long documents to obtain high quality structural knowledge. Our experiments indicate that the DDMAfs model performs well on the synthetic dataset and real datasets. Comparisons between the DDMAfs model and state-of-the-art short text clustering approaches show that the DDMAfs model is effective.

Cite

CITATION STYLE

APA

Yan, Y., Huang, R., Ma, C., Xu, L., Ding, Z., Wang, R., … Liu, B. (2017). Improving document clustering for short texts by long documents via a dirichlet multinomial allocation model. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10366 LNCS, pp. 626–641). Springer Verlag. https://doi.org/10.1007/978-3-319-63579-8_47

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free