Easy, Reproducible and Quality-Controlled Data Collection with CROWDAQ

Qiang Ning; Hao Wu; Pradeep Dasigi; Dheeru Dua; Matt Gardner; Robert L. Logan; Ana Marasović; Zhen Nie

Conference ProceedingsOPEN ACCESS

Easy, Reproducible and Quality-Controlled Data Collection with CROWDAQ

EMNLP 2020 - Conference on Empirical Methods in Natural Language Processing, Proceedings of Systems Demonstrations (2020) 127-134

DOI: 10.18653/v1/2020.emnlp-demos.17

9Citations

62Readers

Abstract

High-quality and large-scale data are key to success for AI systems. However, large-scale data annotation efforts are often confronted with a set of common challenges: (1) designing a user-friendly annotation interface; (2) training enough annotators efficiently; and (3) reproducibility. To address these problems, we introduce CROWDAQ, 1 an open-source platform that standardizes the data collection pipeline with customizable user-interface components, automated annotator qualification, and saved pipelines in a re-usable format. We show that CROWDAQ simplifies data annotation significantly on a diverse set of data collection use cases and we hope it will be a convenient tool for the community.

Cite

CITATION STYLE

APA

Ning, Q., Wu, H., Dasigi, P., Dua, D., Gardner, M., Logan, R. L., … Nie, Z. (2020). Easy, Reproducible and Quality-Controlled Data Collection with CROWDAQ. In EMNLP 2020 - Conference on Empirical Methods in Natural Language Processing, Proceedings of Systems Demonstrations (pp. 127–134). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.emnlp-demos.17

Easy, Reproducible and Quality-Controlled Data Collection with CROWDAQ

Abstract

Cite

Register to see more suggestions