Aggregating and learning from multiple annotators

6Citations
Citations of this article
59Readers
Mendeley users who have this article in their library.

Abstract

The success of NLP research is founded on high-quality annotated datasets, which are usually obtained from multiple expert annotators or crowd workers. The standard practice to training machine learning models is to first adjudicate the disagreements and then perform the training. To this end, there has been a lot of work on aggregating annotations, particularly for classification tasks. However, many other tasks, particularly in NLP, have unique characteristics not considered by standard models of annotation, e.g., label interdependencies in sequence labelling tasks, unrestricted labels for anaphoric annotation, or preference labels for ranking texts. In recent years, researchers have picked up on this and are covering the gap. A first objective of this tutorial is to connect NLP researchers with state-of-the-art aggregation models for a diverse set of canonical language annotation tasks. There is also a growing body of recent work arguing that following the convention and training with adjudicated labels ignores any uncertainty the labellers had in their classifications, which results in models with poorer generalisation capabilities. Therefore, a second objective of this tutorial is to teach NLP workers how they can augment their (deep) neural models to learn from data with multiple interpretations.

Cite

CITATION STYLE

APA

Paun, S., & Simpson, E. (2021). Aggregating and learning from multiple annotators. In EACL 2021 - 16th Conference of the European Chapter of the Association for Computational Linguistics, Tutorial Abstracts (pp. 6–9). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.eacl-tutorials.2

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free