Improving short text classification with augmented data using GPT-3

6Citations
Citations of this article
73Readers
Mendeley users who have this article in their library.

Abstract

GPT-3 is a large-scale natural language model developed by OpenAI that can perform many different tasks, including topic classification. Although researchers claim that it requires only a small number of in-context examples to learn a task, in practice GPT-3 requires these training examples to be either of exceptional quality or a higher quantity than easily created by hand. To address this issue, this study teaches GPT-3 to classify whether a question is related to data science by augmenting a small training set with additional examples generated by GPT-3 itself. This study compares two augmented classifiers: the Classification Endpoint with an increased training set size and the Completion Endpoint with an augmented prompt optimized using a genetic algorithm. We find that data augmentation significantly increases the accuracy of both classifiers, and that the embedding-based Classification Endpoint achieves the best accuracy of about 76%, compared to human accuracy of 85%. In this way, giving large language models like GPT-3 the ability to propose their own training examples can improve short text classification performance.

Cite

CITATION STYLE

APA

Balkus, S. V., & Yan, D. (2023). Improving short text classification with augmented data using GPT-3. Natural Language Engineering. https://doi.org/10.1017/S1351324923000438

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free