Lifelong Learning of Topics and Domain-Specific Word Embeddings

Xiaorui Qin; Yuyin Lu; Yufu Chen; Yanghui Rao

Conference Proceedings

Lifelong Learning of Topics and Domain-Specific Word Embeddings

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (2021) 2294-2309

DOI: 10.18653/v1/2021.findings-acl.202

4Citations

56Readers

Get full text

Abstract

Lifelong topic models mainly focus on in-domain text streams in which each chunk only contains documents from a single domain. To overcome data diversity of the in-domain corpus, most of the existing methods exploit the information from limited sources in a separate and heuristic manner. In this study, we develop a lifelong collaborative model (LCM) based on non-negative matrix factorization to accurately learn topics and domain-specific word embeddings. LCM particularly investigates: (1) developing a knowledge graph based on the semantic relationships among words in the lifelong learning process, so as to accumulate global context information discovered by topic models and local context information reflected by context word embeddings from previous domains, and (2) developing a subword graph based on byte pair encoding and pairwise word relationships to exploit subword information of words in the current in-domain corpus. To the best of our knowledge, we are the first to collaboratively learn topics and word embeddings via lifelong learning. Experiments on real-world in-domain text streams validate the effectiveness of our method.

Cite

CITATION STYLE

APA

Qin, X., Lu, Y., Chen, Y., & Rao, Y. (2021). Lifelong Learning of Topics and Domain-Specific Word Embeddings. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (pp. 2294–2309). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.findings-acl.202

Lifelong Learning of Topics and Domain-Specific Word Embeddings

Abstract

Cite

Register to see more suggestions