Abstract
We study a crowdsourcing setting where we need to infer the latent truth about a task given observed labels together with context in the form of a classifier score. We present Theodon, a hierarchical non-parametric Bayesian model, developed and deployed at Meta, that captures both the prevalence of label categories and the accuracy of labelers as functions of the classifier score. Theodon uses Gaussian processes to model the non-uniformity of mistakes over the range of classifier scores. For our experiments, we used data generated from integrity applications at Meta as well as public datasets. We showed that Theodon (1) obtains 1-4% improvement in AUC-PR predictions on items' true labels compared to state-of-the-art baselines for public datasets, (2) is effective as a calibration method, and (3) provides detailed insights on labelers' performances.
Author supplied keywords
Cite
CITATION STYLE
Nguyen, V. A., Shi, P., Ramakrishnan, J., Torabi, N., Arora, N. S., Weinsberg, U., & Tingley, M. (2022). Crowdsourcing with Contextual Uncertainty. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 3645–3655). Association for Computing Machinery. https://doi.org/10.1145/3534678.3539184
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.