Document clustering for short texts has received considerable interest. Traditional document clustering approaches are designed for long documents and perform poorly for short texts due to the their sparseness representation. To better understand short texts, we observe that words that appear in long documents can enrich short text context and improve the clustering performance for short texts. In this paper, we propose a novel model, namely DDMAfs, which (1) improves the clustering performance of short texts by sharing structural knowledge of long documents to short texts; (2) automatically identifies the number of clusters; (3) separates discriminative words from irrelevant words for long documents to obtain high quality structural knowledge. Our experiments indicate that the DDMAfs model performs well on the synthetic dataset and real datasets. Comparisons between the DDMAfs model and state-of-the-art short text clustering approaches show that the DDMAfs model is effective.
CITATION STYLE
Yan, Y., Huang, R., Ma, C., Xu, L., Ding, Z., Wang, R., … Liu, B. (2017). Improving document clustering for short texts by long documents via a dirichlet multinomial allocation model. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10366 LNCS, pp. 626–641). Springer Verlag. https://doi.org/10.1007/978-3-319-63579-8_47
Mendeley helps you to discover research relevant for your work.