Recently, active learning (AL) methods have been used to effectively fine-tune pre-trained language models for various NLP tasks such as sentiment analysis and document classification. However, given the task of fine-tuning language models, understanding the impact of different aspects on AL methods such as labeling cost, sample acquisition latency, and the diversity of the datasets necessitates a deeper investigation. This paper examines the performance of existing AL methods within a low-resource, interactive labeling setting. We observe that existing methods often underperform in such a setting while exhibiting higher latency and a lack of generalizability. To overcome these challenges, we propose a novel active learning method TYROGUE that employs a hybrid sampling strategy to minimize labeling cost and acquisition latency while providing a framework for adapting to dataset diversity via user guidance. Through our experiments, we observe that compared to SOTA methods, TYROGUE reduces the labeling cost by up to 43% and the acquisition latency by as much as 11X, while achieving comparable accuracy. Finally, we discuss the strengths and weaknesses of TYROGUE by exploring the impact of dataset characteristics.
CITATION STYLE
Maekawa, S., Zhang, D., Kim, H., Rahman, S., & Hruschka, E. (2022). Low-resource Interactive Active Labeling for Fine-tuning Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2022 (pp. 3230–3242). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.findings-emnlp.235
Mendeley helps you to discover research relevant for your work.