UCTopic: Unsupervised Contrastive Learning for Phrase Representations and Topic Mining

Jiacheng Li; Jingbo Shang; Julian McAuley

Conference ProceedingsOPEN ACCESS

UCTopic: Unsupervised Contrastive Learning for Phrase Representations and Topic Mining

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2022) 1 6159-6169

DOI: 10.18653/v1/2022.acl-long.426

26Citations

76Readers

Abstract

High-quality phrase representations are essential to finding topics and related terms in documents (a.k.a. topic mining). Existing phrase representation learning methods either simply combine unigram representations in a context-free manner or rely on extensive annotations to learn context-aware knowledge. In this paper, we propose UCTOPIC, a novel unsupervised contrastive learning framework for context-aware phrase representations and topic mining. UCTOPIC is pretrained in a large scale to distinguish if the contexts of two phrase mentions have the same semantics. The key to pretraining is positive pair construction from our phrase-oriented assumptions. However, we find traditional in-batch negatives cause performance decay when finetuning on a dataset with small topic numbers. Hence, we propose cluster-assisted contrastive learning (CCL) which largely reduces noisy negatives by selecting negatives from clusters and further improves phrase representations for topics accordingly. UCTOPIC outperforms the state-of-the-art phrase representation model by 38.2% NMI in average on four entity clustering tasks. Comprehensive evaluation on topic mining shows that UCTOPIC can extract coherent and diverse topical phrases.

Cite

CITATION STYLE

APA

Li, J., Shang, J., & McAuley, J. (2022). UCTopic: Unsupervised Contrastive Learning for Phrase Representations and Topic Mining. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 6159–6169). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.acl-long.426

UCTopic: Unsupervised Contrastive Learning for Phrase Representations and Topic Mining

Abstract

Cite

Register to see more suggestions