CRUX: Adaptive querying for efficient crowdsourced data extraction

Theodoros Rekatsinas; Amol Deshpande; Aditya Parameswaran

Conference ProceedingsOPEN ACCESS

CRUX: Adaptive querying for efficient crowdsourced data extraction

International Conference on Information and Knowledge Management, Proceedings (2019) 841-850

DOI: 10.1145/3357384.3357976

2Citations

6Readers

Get full text

Abstract

Crowdsourcing is essential for collecting information about real-world entities. Existing crowdsourced data extraction solutions use fixed, non-adaptive querying strategies that repeatedly ask workers to provide entities from a fixed domain until a desired level of coverage is reached. Unfortunately, such solutions are highly impractical as they yield many duplicate extractions. We design an adaptive querying framework, CRUX, that maximizes the number of extracted entities for a given budget. We show that the problem of budgeted crowdsourced entity extraction is NP-Hard. We leverage two insights to focus our extraction efforts: exploiting the structure of the domain of interest, and using exclude lists to limit repeated extractions. We develop new statistical tools to reason about the number of new distinct extracted entities of additional queries under the presence of little information, and embed them within adaptive algorithms that maximize the distinct extracted entities under budget constraints. We evaluate our techniques on synthetic and real-world datasets, demonstrating an improvement of up to 300% over competing approaches for the same budget.

Author supplied keywords

Cite

CITATION STYLE

APA

Rekatsinas, T., Deshpande, A., & Parameswaran, A. (2019). CRUX: Adaptive querying for efficient crowdsourced data extraction. In International Conference on Information and Knowledge Management, Proceedings (pp. 841–850). Association for Computing Machinery. https://doi.org/10.1145/3357384.3357976

CRUX: Adaptive querying for efficient crowdsourced data extraction

Abstract

Author supplied keywords

Cite

Register to see more suggestions