Abstract
Cold-start active learning (CSAL) selects valuable instances from an unlabeled dataset for manual annotation. It provides high-quality data at a low annotation cost for label-scarce text classification. However, existing CSAL methods overlook weak classes and hard representative examples, resulting in biased learning. To address these issues, this paper proposes a novel dual-diversity enhancing and uncertainty-aware (Deuce) framework for CSAL. Specifically, Deuce leverages a pretrained language model (PLM) to efficiently extract textual representations, class predictions, and predictive uncertainty. Then, it constructs a Dual-Neighbor Graph (DNG) to combine information on both textual diversity and class diversity, ensuring a balanced data distribution. It further propagates uncertainty information via density-based clustering to select hard representative instances. Deuce performs well in selecting class-balanced and hard representative data by dual-diversity and informativeness. Experiments on six NLP datasets demonstrate the superiority and efficiency of Deuce.
Cite
CITATION STYLE
Guo, J., Philip Chen, C. L., Li, S., & Zhang, T. (2024). Deuce: Dual-diversity Enhancement and Uncertainty-awareness for Cold-start Active Learning. Transactions of the Association for Computational Linguistics, 12, 1736–1754. https://doi.org/10.1162/tacl_a_00731
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.