Deuce: Dual-diversity Enhancement and Uncertainty-awareness for Cold-start Active Learning

0Citations
Citations of this article
16Readers
Mendeley users who have this article in their library.

Abstract

Cold-start active learning (CSAL) selects valuable instances from an unlabeled dataset for manual annotation. It provides high-quality data at a low annotation cost for label-scarce text classification. However, existing CSAL methods overlook weak classes and hard representative examples, resulting in biased learning. To address these issues, this paper proposes a novel dual-diversity enhancing and uncertainty-aware (Deuce) framework for CSAL. Specifically, Deuce leverages a pretrained language model (PLM) to efficiently extract textual representations, class predictions, and predictive uncertainty. Then, it constructs a Dual-Neighbor Graph (DNG) to combine information on both textual diversity and class diversity, ensuring a balanced data distribution. It further propagates uncertainty information via density-based clustering to select hard representative instances. Deuce performs well in selecting class-balanced and hard representative data by dual-diversity and informativeness. Experiments on six NLP datasets demonstrate the superiority and efficiency of Deuce.

Cite

CITATION STYLE

APA

Guo, J., Philip Chen, C. L., Li, S., & Zhang, T. (2024). Deuce: Dual-diversity Enhancement and Uncertainty-awareness for Cold-start Active Learning. Transactions of the Association for Computational Linguistics, 12, 1736–1754. https://doi.org/10.1162/tacl_a_00731

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free