As researchers developing robust NLP for a wide range of text types, we are often confronted with the prejudice that annotation of non-canonical language (whatever that means) is somehow more arbitrary than annotation of canonical language. To investigate this, we present a small annotation study where annotators were asked, with minimal guidelines, to identify main predicates and arguments in sentences across five different domains, ranging from newswire to Twitter. Our study indicates that (at least such) annotation of non-canonical language is not harder. However, we also observe that agreements in social media domains correlate less with model confidence, suggesting that maybe annotators disagree for different reasons when annotating social media data.
CITATION STYLE
Plank, B., Alonso, H. M., & Søgaard, A. (2020). Non-canonical language is not harder to annotate than canonical language. In LAW 2015 - 9th Linguistic Annotation Workshop, held in conjuncion with NAACL 2015 - Proceedings of the Workshop (pp. 148–151). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/w15-1617
Mendeley helps you to discover research relevant for your work.