Uncertainty-aware generative models for inferring document class prevalence

12Citations
Citations of this article
101Readers
Mendeley users who have this article in their library.

Abstract

Prevalence estimation is the task of inferring the relative frequency of classes of unlabeled examples in a group-for example, the proportion of a document collection with positive sentiment. Previous work has focused on aggregating and adjusting discriminative individual classifiers to obtain prevalence point estimates. But imperfect classifier accuracy ought to be reflected in uncertainty over the predicted prevalence for scientifically valid inference. In this work, we present (1) a generative probabilistic modeling approach to prevalence estimation, and (2) the construction and evaluation of prevalence confidence intervals; in particular, we demonstrate that an off-the-shelf discriminative classifier can be given a generative re-interpretation, by backing out an implicit individual-level likelihood function, which can be used to conduct fast and simple group-level Bayesian inference. Empirically, we demonstrate our approach provides better confidence interval coverage than an alternative, and is dramatically more robust to shifts in the class prior between training and testing.1

Cite

CITATION STYLE

APA

Keith, K. A., & O’Connor, B. (2018). Uncertainty-aware generative models for inferring document class prevalence. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018 (pp. 4575–4585). Association for Computational Linguistics. https://doi.org/10.18653/v1/d18-1487

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free