Language Models in the Loop: Incorporating Prompting into Weak Supervision

Ryan Smith; Jason A. Fries; Braden Hancock; Stephen H. Bach

Journal ArticleOPEN ACCESS

Language Models in the Loop: Incorporating Prompting into Weak Supervision

Smith R
Fries J
Hancock B
et al.

ACM / IMS Journal of Data Science (2024) 1(2) 1-30

DOI: 10.1145/3617130

N/ACitations

41Readers

Abstract

We propose a new strategy for applying large pre-trained language models to novel tasks when labeled training data is limited. Rather than apply the model in a typical zero-shot or few-shot fashion, we treat the model as the basis for labeling functions in a weak supervision framework. To create a classifier, we first prompt the model to answer multiple distinct queries about an example and define how the possible responses should be mapped to votes for labels and abstentions. We then denoise these noisy label sources using the Snorkel system and train an end classifier with the resulting training data. Our experimental evaluation shows that prompting large language models within a weak supervision framework can provide significant gains in accuracy. On the WRENCH weak supervision benchmark, this approach can significantly improve over zero-shot performance, an average 19.5% reduction in errors. We also find that this approach produces classifiers with comparable or superior accuracy to those trained from hand-engineered rules. Problem statement The goal of this paper is to use large language models to create smaller, specialized models. These specialized models can be better suited to specific tasks because they are tuned for them and are less expensive to serve in production. Existing approaches create training data for specialized models by prompting large language models in a zero-shot fashion, i.e., they instruct the language model to solve the task of interest and treat the responses as ground truth. This approach can be unreliable when the language model has noisy outputs and is sensitive to the wording of the prompt. Methods We address the problems of noisy outputs and prompt sensitivity by proposing a new strategy. We treat large language models as the basis for labeling functions in a weak supervision framework. To create a classifier, we first prompt the model to answer multiple distinct queries about an example and define how the possible responses should be mapped to votes for labels and abstentions. We then denoise these noisy label sources using the Snorkel system and train an end classifier with the resulting training data. Results Our experimental evaluation shows that prompting large language models within a weak supervision framework can provide significant gains in accuracy. On the WRENCH weak supervision benchmark, this approach can significantly improve over zero-shot performance, an average 19.5% reduction in errors. We also find that this approach produces classifiers with comparable or superior accuracy to those trained from hand-engineered rules. Significance Large language models are increasingly the starting point in many areas of machine learning. Incorporating prompting into weak supervision can enable users to more easily and accurately adapt them to specialized tasks.

Cite

CITATION STYLE

APA

Smith, R., Fries, J. A., Hancock, B., & Bach, S. H. (2024). Language Models in the Loop: Incorporating Prompting into Weak Supervision. ACM / IMS Journal of Data Science, 1(2), 1–30. https://doi.org/10.1145/3617130

Language Models in the Loop: Incorporating Prompting into Weak Supervision

Abstract

Cite

Register to see more suggestions