Abstract
Lifelong topic models mainly focus on in-domain text streams in which each chunk only contains documents from a single domain. To overcome data diversity of the in-domain corpus, most of the existing methods exploit the information from limited sources in a separate and heuristic manner. In this study, we develop a lifelong collaborative model (LCM) based on non-negative matrix factorization to accurately learn topics and domain-specific word embeddings. LCM particularly investigates: (1) developing a knowledge graph based on the semantic relationships among words in the lifelong learning process, so as to accumulate global context information discovered by topic models and local context information reflected by context word embeddings from previous domains, and (2) developing a subword graph based on byte pair encoding and pairwise word relationships to exploit subword information of words in the current in-domain corpus. To the best of our knowledge, we are the first to collaboratively learn topics and word embeddings via lifelong learning. Experiments on real-world in-domain text streams validate the effectiveness of our method.
Cite
CITATION STYLE
Qin, X., Lu, Y., Chen, Y., & Rao, Y. (2021). Lifelong Learning of Topics and Domain-Specific Word Embeddings. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (pp. 2294–2309). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.findings-acl.202
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.