On the efficiency of data collection for crowdsourced classification

Edoardo Manino; Long Tran-Thanh; Nicholas R. Jennings

Conference Proceedings

On the efficiency of data collection for crowdsourced classification

IJCAI International Joint Conference on Artificial Intelligence (2018) 2018-July 1568-1575

DOI: 10.24963/ijcai.2018/217

8Citations

19Readers

Get full text

Abstract

The quality of crowdsourced data is often highly variable. For this reason, it is common to collect redundant data and use statistical methods to aggregate it. Empirical studies show that the policies we use to collect such data have a strong impact on the accuracy of the system. However, there is little theoretical understanding of this phenomenon. In this paper we provide the first theoretical explanation of the accuracy gap between the most popular collection policies: the non-adaptive uniform allocation, and the adaptive uncertainty sampling and information gain maximisation. To do so, we propose a novel representation of the collection process in terms of random walks. Then, we use this tool to derive lower and upper bounds on the accuracy of the policies. With these bounds, we are able to quantify the advantage that the two adaptive policies have over the non-adaptive one for the first time.

Cite

CITATION STYLE

APA

Manino, E., Tran-Thanh, L., & Jennings, N. R. (2018). On the efficiency of data collection for crowdsourced classification. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2018-July, pp. 1568–1575). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2018/217

On the efficiency of data collection for crowdsourced classification

Abstract

Cite

Register to see more suggestions