Lifelong Learning of Topics and Domain-Specific Word Embeddings

4Citations
Citations of this article
56Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Lifelong topic models mainly focus on in-domain text streams in which each chunk only contains documents from a single domain. To overcome data diversity of the in-domain corpus, most of the existing methods exploit the information from limited sources in a separate and heuristic manner. In this study, we develop a lifelong collaborative model (LCM) based on non-negative matrix factorization to accurately learn topics and domain-specific word embeddings. LCM particularly investigates: (1) developing a knowledge graph based on the semantic relationships among words in the lifelong learning process, so as to accumulate global context information discovered by topic models and local context information reflected by context word embeddings from previous domains, and (2) developing a subword graph based on byte pair encoding and pairwise word relationships to exploit subword information of words in the current in-domain corpus. To the best of our knowledge, we are the first to collaboratively learn topics and word embeddings via lifelong learning. Experiments on real-world in-domain text streams validate the effectiveness of our method.

Cite

CITATION STYLE

APA

Qin, X., Lu, Y., Chen, Y., & Rao, Y. (2021). Lifelong Learning of Topics and Domain-Specific Word Embeddings. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (pp. 2294–2309). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.findings-acl.202

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free