Content quality of clustered latent dirichlet allocation short summaries

Muthukkaruppan Annamalai; Siti Farah Nasehah Mukhlis

Journal Article

Content quality of clustered latent dirichlet allocation short summaries

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2014) 8870 494-504

DOI: 10.1007/978-3-319-12844-3_42

1Citations

5Readers

Get full text

Abstract

Latent Dirichlet Allocation (LDA) is a commonly used topic model based summarisation method. However, the generated summaries contain words that are somewhat general and unrelated to the topic. Since the summary depends on word distribution in the input documents and, because the topic signature feature values are averaged across all documents, we think clustering can help to overcome this problem. Therefore, this work sets out to investigate whether clustering the input documents beforehand (clusLDA) can help to improve the content quality of the generated summaries. The words in a LDA summary are weighted and a short summary of 0.67% of the input text size is constituted using significant words proportionately drawn from the clustered summaries. The divergence probabilities of the resulting summaries are compared against the summary produced by LDA without clustering (UnclusLDA). The results are validated using input of various text sizes and different clustering techniques. And, our findings indicate that clustering does not necessarily help to improve the content quality of short summaries.

Author supplied keywords

Cite

CITATION STYLE

APA

Annamalai, M., & Farah Nasehah Mukhlis, S. (2014). Content quality of clustered latent dirichlet allocation short summaries. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8870, 494–504. https://doi.org/10.1007/978-3-319-12844-3_42

Content quality of clustered latent dirichlet allocation short summaries

Abstract

Author supplied keywords

Cite

Register to see more suggestions