To computationally model discourse phenomena such as argumentation we need corpora with reliable annotation of the phenomena under study. Annotating complex discourse phenomena poses two challenges: fuzziness of unit boundaries and the need for multiple annotators. We show that current metrics for inter-annotator agreement (IAA) such as P/R/F1 and Krippendorff's α provide inconsistent results for the same text. In addition, IAA metrics do not tell us what parts of a text are easier or harder for human judges to annotate and so do not provide sufficiently specific information for evaluating systems that automatically identify discourse units. We propose a hierarchical clustering approach that aggregates overlapping text segments of text identified by multiple annotators; the more annotators who identify a text segment, the easier we assume that the text segment is to annotate. The clusters make it possible to quantify the extent of agreement judges show about text segments; this information can be used to assess the output of systems that automatically identify discourse units.
CITATION STYLE
Wacholder, N., Muresan, S., Ghosh, D., & Aakhus, M. (2020). Annotating multiparty discourse: Challenges for agreement metrics. In LAW 2014 - 8th Linguistic Annotation Workshop, in conjunction with COLING 2014 - Proceedings of the Workshop (pp. 120–128). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/w14-4918
Mendeley helps you to discover research relevant for your work.