Content quality of clustered latent dirichlet allocation short summaries

1Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Latent Dirichlet Allocation (LDA) is a commonly used topic model based summarisation method. However, the generated summaries contain words that are somewhat general and unrelated to the topic. Since the summary depends on word distribution in the input documents and, because the topic signature feature values are averaged across all documents, we think clustering can help to overcome this problem. Therefore, this work sets out to investigate whether clustering the input documents beforehand (clusLDA) can help to improve the content quality of the generated summaries. The words in a LDA summary are weighted and a short summary of 0.67% of the input text size is constituted using significant words proportionately drawn from the clustered summaries. The divergence probabilities of the resulting summaries are compared against the summary produced by LDA without clustering (UnclusLDA). The results are validated using input of various text sizes and different clustering techniques. And, our findings indicate that clustering does not necessarily help to improve the content quality of short summaries.

Cite

CITATION STYLE

APA

Annamalai, M., & Farah Nasehah Mukhlis, S. (2014). Content quality of clustered latent dirichlet allocation short summaries. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8870, 494–504. https://doi.org/10.1007/978-3-319-12844-3_42

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free