Data annotation is a time-consuming and labor-intensive process for many NLP tasks. Although there exist various methods to produce pseudo data labels, they are often taskspecific and require a decent amount of labeled data to start with. Recently, the immense language model GPT-3 with 175 billion parameters has achieved tremendous improvement across many few-shot learning tasks. In this paper, we explore ways to leverage GPT-3 as a low-cost data labeler to train other models. We find that, to make the downstream model achieve the same performance on a variety of NLU and NLG tasks, it costs 50% to 96% less to use labels from GPT-3 than using labels from humans. Furthermore, we propose a novel framework of combining pseudo labels from GPT-3 with human labels, which leads to even better performance with limited labeling budget. These results present a cost-effective data labeling methodology that is generalizable to many practical applications.
CITATION STYLE
Wang, S., Liu, Y., Xu, Y., Zhu, C., & Zeng, M. (2021). Want to Reduce Labeling Cost? GPT-3 Can Help. In Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021 (pp. 4195–4205). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.findings-emnlp.354
Mendeley helps you to discover research relevant for your work.