Human Intelligence Tasks (HITs) allow people to collect and curate labeled data from multiple annotators. Then labels are often aggregated to create an annotated dataset suitable for supervised machine learning tasks. The most popular label aggregation method is majority voting, where each item in the dataset is assigned the most common label from the annotators. This approach is optimal when annotators are unbiased domain experts. In this paper, we propose Debiased Label Aggregation (DLA) an alternative method for label aggregation in subjective HITs, where cross-annotator agreement varies. DLA leverages user voting behavior patterns to weight labels. Our experiments show that DLA outperforms majority voting in several performance metrics; e.g. a percentage increase of 20 points in the F1 measure before data augmentation, and a percentage increase of 35 points in the same measure after data augmentation. Since DLA is deceptively simple, we hope it will help researchers to tackle subjective labeling tasks.
CITATION STYLE
Wallace, S., Cai, T., Le, B., & Leiva, L. A. (2022). Debiased Label Aggregation for Subjective Crowdsourcing Tasks. In Conference on Human Factors in Computing Systems - Proceedings. Association for Computing Machinery. https://doi.org/10.1145/3491101.3519614
Mendeley helps you to discover research relevant for your work.