Estimation of discourse segmentation labels from crowd data

9Citations
Citations of this article
99Readers
Mendeley users who have this article in their library.
Get full text

Abstract

For annotation tasks involving independent judgments, probabilistic models have been used to infer ground truth labels from data where a crowd of many annotators labels the same items. Such models have been shown to produce results superior to taking the majority vote, but have not been applied to sequential data. We present two methods to infer ground truth labels from sequential annotations where we assume judgments are not independent, based on the observation that an annotator's segments all tend to be several utterances long. The data consists of crowd labels for annotation of discourse segment boundaries. The new methods extend Hidden Markov Models to relax the independence assumption. The two methods are distinct, so positive labels proposed by both are taken to be ground truth. In addition, results of the models are checked using metrics that test whether an annotator's accuracy relative to a given model remains consistent across different conversations.

Cite

CITATION STYLE

APA

Huang, Z., Zhong, J., & Passonneau, R. J. (2015). Estimation of discourse segmentation labels from crowd data. In Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing (pp. 2190–2200). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/d15-1261

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free