Probabilistic Modeling for Crowdsourcing Partially-Subjective Ratings

18Citations
Citations of this article
19Readers
Mendeley users who have this article in their library.

Abstract

While many methods have been proposed to ensure data quality for objective tasks (in which a single correct response is presumed to exist for each item), estimating data quality with subjective tasks remains largely unexplored. Consider the popular task of collecting instance ratings from human judges: while agreement tends be high for instances having extremely good or bad properties, instances with more middling properties naturally elicit a wider variance in opinion. In addition, because such subjectivity permits a valid diversity of responses, it can be difficult to detect if a judge does not undertake the task in good faith. To address this, we propose a probabilistic, heteroskedastic model in which the means and variances of worker responses are modeled as functions of instance attributes. We derive efficient Expectation Maximization (EM) learning and variational inference algorithms for parameter estimation. We apply our model to a large dataset of 24,132 Mechanical Turk ratings of user experience in viewing videos on smartphones with varying hardware capabilities. Results show that our method is effective at both predicting user ratings and in detecting unreliable respondents.

Cite

CITATION STYLE

APA

Nguyen, A. T., Halpern, M., Wallace, B. C., & Lease, M. (2016). Probabilistic Modeling for Crowdsourcing Partially-Subjective Ratings. In Proceedings of the 4th AAAI Conference on Human Computation and Crowdsourcing, HCOMP 2016 (pp. 149–158). AAAI Press. https://doi.org/10.1609/hcomp.v4i1.13274

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free