Collaboratively Modeling and Embedding of Latent Topics for Short Texts

15Citations
Citations of this article
29Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Deriving a successful document representation is the critical challenge in many downstream tasks in NLP, especially when documents are very short. It is challenging to handle the sparsity and the noise problems confronting short texts. Some approaches employ latent topic models, based on global word co-occurrence, to obtain topic distribution as the representation. Others leverage word embeddings, which consider local conditional dependencies, to map a document as a summation vector of them. Unlike the existing works which explore the strategy of utilizing one to help the other, i.e., topic models for word embeddings or vice versa, we propose CME-DMM, a collaboratively modeling and embedding framework for capturing coherent latent topics from short texts. CME-DMM incorporates topic and word embeddings through the attention mechanism and implants them into the latent topic models, which significantly improve the quality of latent topics. Extensive experiments demonstrate that CME-DMM could perceive more coherent topics than other popular methods, resulting in a better performance in downstream NLP tasks such as classification. Besides the interpretable latent topics, the corresponding topic embeddings can describe the meanings of latent topics in the semantic space. The attention vectors, as a by-product of the learning process, can identify the keywords in noisy short texts.

Cite

CITATION STYLE

APA

Liu, Z., Qin, T., Chen, K. J., & Li, Y. (2020). Collaboratively Modeling and Embedding of Latent Topics for Short Texts. IEEE Access, 8, 99141–99153. https://doi.org/10.1109/ACCESS.2020.2997973

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free