Weakly Supervised Text Classification using Supervision Signals from a Language Model

13Citations
Citations of this article
46Readers
Mendeley users who have this article in their library.

Abstract

Solving text classification in a weakly supervised manner is important for real-world applications where human annotations are scarce. In this paper, we propose to query a masked language model with cloze style prompts to obtain supervision signals. We design a prompt which combines the document itself and "this article is talking about [MASK]." A masked language model can generate words for the [MASK] token. The generated words which summarize the content of a document can be utilized as supervision signals. We propose a latent variable model to learn a word distribution learner which associates generated words to pre-defined categories and a document classifier simultaneously without using any annotated data. Evaluation on three datasets, AGNews, 20Newsgroups, and UCINews, shows that our method can outperform baselines by 2%, 4%, and 3%.

Cite

CITATION STYLE

APA

Zeng, Z., Ni, W., Fang, T., Li, X., Zhao, X., & Song, Y. (2022). Weakly Supervised Text Classification using Supervision Signals from a Language Model. In Findings of the Association for Computational Linguistics: NAACL 2022 - Findings (pp. 2295–2305). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.findings-naacl.176

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free