Text Similarity Computing Based on LDA Topic Model and Word Co-occurrence

Minglai Shao; Liangxi Qin

Conference ProceedingsOPEN ACCESS

Text Similarity Computing Based on LDA Topic Model and Word Co-occurrence

Shao M
Qin L

Proceedings of the 2nd International Conference on Software Engineering, Knowledge Engineering and Information Engineering (SEKEIE 2014) (2014) 114

DOI: 10.2991/sekeie-14.2014.47

N/ACitations

52Readers

Abstract

LDA (Latent Dirichlet Allocation) topic model has been widely applied to text clustering owing to its efficient dimension reduction. The prevalent method is to model text set through LDA topic model, to make inference by Gibbs sampling, and to calculate text similarity with JS (Jensen-Shannon) distance. However, JS distance cannot distinguish semantic associations among text topics. For this defect, a new text similarity computing algorithm based on hidden topics model and word co-occurrence analysis is introduced. Tests are carried out to verify the clustering effect of this improved computing algorithm. Results show that this method can effectively improve text similarity computing result and text clustering accuracy.

Cite

CITATION STYLE

APA

Shao, M., & Qin, L. (2014). Text Similarity Computing Based on LDA Topic Model and Word Co-occurrence. In Proceedings of the 2nd International Conference on Software Engineering, Knowledge Engineering and Information Engineering (SEKEIE 2014) (Vol. 114). Atlantis Press. https://doi.org/10.2991/sekeie-14.2014.47

Text Similarity Computing Based on LDA Topic Model and Word Co-occurrence

Abstract

Cite

Register to see more suggestions