A chance-corrected measure of inter-annotator agreement for syntax

Arne Skjærholt

Conference ProceedingsOPEN ACCESS

A chance-corrected measure of inter-annotator agreement for syntax

Skjærholt A

52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014 - Proceedings of the Conference (2014) 1 934-944

DOI: 10.3115/v1/p14-1088

12Citations

105Readers

Abstract

Following the works of Carletta (1996) and Artstein and Poesio (2008), there is an increasing consensus within the field that in order to properly gauge the reliability of an annotation effort, chance-corrected measures of inter-annotator agreement should be used. With this in mind, it is striking that virtually all evaluations of syntactic annotation efforts use uncorrected parser evaluation metrics such as bracket F1 (for phrase structure) and accuracy scores (for dependencies). In this work we present a chance-corrected metric based on Krippendorff's , adapted to the structure of syntactic annotations and applicable both to phrase structure and dependency annotation without any modifications. To evaluate our metric we first present a number of synthetic experiments to better control the sources of noise and gauge the metric's responses, before finally contrasting the behaviour of our chance-corrected metric with that of uncorrected parser evaluation metrics on real corpora. © 2014 Association for Computational Linguistics.

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Skjærholt, A. (2014). A chance-corrected measure of inter-annotator agreement for syntax. In 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014 - Proceedings of the Conference (Vol. 1, pp. 934–944). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/p14-1088

Readers' Seniority

PhD / Post grad / Masters / Doc 42

67%

Researcher 11

17%

Professor / Associate Prof. 7

11%

Lecturer / Post doc 3

Readers' Discipline

Computer Science 49

75%

Linguistics 12

18%

Neuroscience 2

Engineering 2

A chance-corrected measure of inter-annotator agreement for syntax

Abstract

References Powered by Scopus

A Coefficient of Agreement for Nominal Scales

Reliability of content analysis: The case of nominal scale coding

Inter-coder agreement for computational linguistics

Cited by Powered by Scopus

Phrase structure annotation and parsing for learner English

Measuring Annotator Agreement Generally across Complex Structured, Multi-object, and Free-text Annotation Tasks

Exploring ensemble dependency parsing to reduce manual annotation workload

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline