Topic models are widely used to extra the latent knowledge of short texts. However, due to data sparsity, traditional topic models based on word co-occurrence patterns have trouble achieving accurate results on short texts. Researchers have recently proposed knowledge-based topic models for short texts to discover more coherent and meaningful topics. Every form of knowledge aims at representing specific and oriented information but is not wide-ranging. Single knowledge-enhanced topic models only take a form of knowledge into count, which is restricted and undesirable. The more forms of knowledge we incorporate, the more comprehensive our understanding of the short text. In this paper, we propose a novel short texts topic model, named MultiKE-DMM, which combines multiple forms of knowledge and the generalized P o´ lya urn (GPU) model with Dirichlet Multinomial Mixture (DMM) model. The proposed approach boosts the multi-knowledge background-related words under the same topic. Access to multi-form knowledge permits the creation of an intelligent topic modelling algorithm that considers semantic and fact-oriented relationships between words, offering improved performance over four comparison models on four real-world short text datasets.
CITATION STYLE
He, J., Chen, J., & Li, M. J. (2023). Multi-knowledge Embeddings Enhanced Topic Modeling for Short Texts. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13625 LNCS, pp. 521–532). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-30111-7_44
Mendeley helps you to discover research relevant for your work.