Context-Aware Latent Dirichlet Allocation for Topic Segmentation

Wenbo Li; Tetsu Matsukawa; Hiroto Saigo; Einoshin Suzuki

Conference ProceedingsOPEN ACCESS

Context-Aware Latent Dirichlet Allocation for Topic Segmentation

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2020) 12084 LNAI 475-486

DOI: 10.1007/978-3-030-47426-3_37

5Citations

11Readers

Abstract

We propose a new generative model for topic segmentation based on Latent Dirichlet Allocation. The task is to divide a document into a sequence of topically coherent segments, while preserving long topic change-points (coherency) and keeping short topic segments from getting merged (saliency). Most of the existing models either fuse topic segments by keywords or focus on modeling word co-occurrence patterns without merging. They can hardly achieve both coherency and saliency since many words have high uncertainties in topic assignments due to their polysemous nature. To solve this problem, we introduce topic-specific co-occurrence of word pairs within contexts in modeling, to generate more coherent segments and alleviate the influence of irrelevant words on topic assignment. We also design an optimization algorithm to eliminate redundant items in the generated topic segments. Experimental results show that our proposal produces significant improvements in both topic coherence and topic segmentation.

Cite

CITATION STYLE

APA

Li, W., Matsukawa, T., Saigo, H., & Suzuki, E. (2020). Context-Aware Latent Dirichlet Allocation for Topic Segmentation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12084 LNAI, pp. 475–486). Springer. https://doi.org/10.1007/978-3-030-47426-3_37

Context-Aware Latent Dirichlet Allocation for Topic Segmentation

Abstract

Cite

Register to see more suggestions