Abstract
One of the key steps in language resource creation is the identification of the text segments to be annotated, or markables- in our case, the (potentially nested) noun phrases in coreference resolution (or mentions). In this paper, we present a method for identifying markables for coreference annotation that combines high-performance automatic markable detectors with checking with a Game-With-A-Purpose (GWAP) and aggregation using a Bayesian annotation model. The method was evaluated both on news data and data from a variety of other genres and results in an improvement on F1 of mention boundaries of over seven percentage points when compared with a state-of-the-art, domain-independent automatic mention detector, and almost three points over an in-domain mention detector. One of the key contributions of our proposal is its applicability to the case in which markables are nested, as is the case with coreference markables; but the GWAP and several of the proposed markable detectors are task- and language-independent and are thus applicable to a variety of other annotation scenarios.
Cite
CITATION STYLE
Madge, C., Yu, J., Chamberlain, J., Kruschwitz, U., Paun, S., & Poesio, M. (2020). Crowdsourcing and aggregating nested markable annotations. In ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (pp. 797–807). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/p19-1077
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.