CRUX: Adaptive querying for efficient crowdsourced data extraction

2Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Crowdsourcing is essential for collecting information about real-world entities. Existing crowdsourced data extraction solutions use fixed, non-adaptive querying strategies that repeatedly ask workers to provide entities from a fixed domain until a desired level of coverage is reached. Unfortunately, such solutions are highly impractical as they yield many duplicate extractions. We design an adaptive querying framework, CRUX, that maximizes the number of extracted entities for a given budget. We show that the problem of budgeted crowdsourced entity extraction is NP-Hard. We leverage two insights to focus our extraction efforts: exploiting the structure of the domain of interest, and using exclude lists to limit repeated extractions. We develop new statistical tools to reason about the number of new distinct extracted entities of additional queries under the presence of little information, and embed them within adaptive algorithms that maximize the distinct extracted entities under budget constraints. We evaluate our techniques on synthetic and real-world datasets, demonstrating an improvement of up to 300% over competing approaches for the same budget.

Cite

CITATION STYLE

APA

Rekatsinas, T., Deshpande, A., & Parameswaran, A. (2019). CRUX: Adaptive querying for efficient crowdsourced data extraction. In International Conference on Information and Knowledge Management, Proceedings (pp. 841–850). Association for Computing Machinery. https://doi.org/10.1145/3357384.3357976

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free