Language Models in the Loop: Incorporating Prompting into Weak Supervision

  • Smith R
  • Fries J
  • Hancock B
  • et al.
N/ACitations
Citations of this article
41Readers
Mendeley users who have this article in their library.

Abstract

We propose a new strategy for applying large pre-trained language models to novel tasks when labeled training data is limited. Rather than apply the model in a typical zero-shot or few-shot fashion, we treat the model as the basis for labeling functions in a weak supervision framework. To create a classifier, we first prompt the model to answer multiple distinct queries about an example and define how the possible responses should be mapped to votes for labels and abstentions. We then denoise these noisy label sources using the Snorkel system and train an end classifier with the resulting training data. Our experimental evaluation shows that prompting large language models within a weak supervision framework can provide significant gains in accuracy. On the WRENCH weak supervision benchmark, this approach can significantly improve over zero-shot performance, an average 19.5% reduction in errors. We also find that this approach produces classifiers with comparable or superior accuracy to those trained from hand-engineered rules. Problem statement The goal of this paper is to use large language models to create smaller, specialized models. These specialized models can be better suited to specific tasks because they are tuned for them and are less expensive to serve in production. Existing approaches create training data for specialized models by prompting large language models in a zero-shot fashion, i.e., they instruct the language model to solve the task of interest and treat the responses as ground truth. This approach can be unreliable when the language model has noisy outputs and is sensitive to the wording of the prompt. Methods We address the problems of noisy outputs and prompt sensitivity by proposing a new strategy. We treat large language models as the basis for labeling functions in a weak supervision framework. To create a classifier, we first prompt the model to answer multiple distinct queries about an example and define how the possible responses should be mapped to votes for labels and abstentions. We then denoise these noisy label sources using the Snorkel system and train an end classifier with the resulting training data. Results Our experimental evaluation shows that prompting large language models within a weak supervision framework can provide significant gains in accuracy. On the WRENCH weak supervision benchmark, this approach can significantly improve over zero-shot performance, an average 19.5% reduction in errors. We also find that this approach produces classifiers with comparable or superior accuracy to those trained from hand-engineered rules. Significance Large language models are increasingly the starting point in many areas of machine learning. Incorporating prompting into weak supervision can enable users to more easily and accurately adapt them to specialized tasks.

Cite

CITATION STYLE

APA

Smith, R., Fries, J. A., Hancock, B., & Bach, S. H. (2024). Language Models in the Loop: Incorporating Prompting into Weak Supervision. ACM / IMS Journal of Data Science, 1(2), 1–30. https://doi.org/10.1145/3617130

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free