Want to Reduce Labeling Cost? GPT-3 Can Help

114Citations
Citations of this article
160Readers
Mendeley users who have this article in their library.

Abstract

Data annotation is a time-consuming and labor-intensive process for many NLP tasks. Although there exist various methods to produce pseudo data labels, they are often taskspecific and require a decent amount of labeled data to start with. Recently, the immense language model GPT-3 with 175 billion parameters has achieved tremendous improvement across many few-shot learning tasks. In this paper, we explore ways to leverage GPT-3 as a low-cost data labeler to train other models. We find that, to make the downstream model achieve the same performance on a variety of NLU and NLG tasks, it costs 50% to 96% less to use labels from GPT-3 than using labels from humans. Furthermore, we propose a novel framework of combining pseudo labels from GPT-3 with human labels, which leads to even better performance with limited labeling budget. These results present a cost-effective data labeling methodology that is generalizable to many practical applications.

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Wang, S., Liu, Y., Xu, Y., Zhu, C., & Zeng, M. (2021). Want to Reduce Labeling Cost? GPT-3 Can Help. In Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021 (pp. 4195–4205). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.findings-emnlp.354

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 36

61%

Researcher 15

25%

Professor / Associate Prof. 5

8%

Lecturer / Post doc 3

5%

Readers' Discipline

Tooltip

Computer Science 49

78%

Business, Management and Accounting 6

10%

Engineering 4

6%

Linguistics 4

6%

Save time finding and organizing research with Mendeley

Sign up for free