The OntoNotes corpus is widely used for training and testing coreference resolution systems, but only little attention has so far been given to the differences between the different genres of language that the corpus is composed of. We are primarily interested in the contrast between spoken and written language, and thus we conducted in-depth analyses of various reference-related properties of the sub-corpora of OntoNotes, which yield several statistically significant differences. We compare these to predictions made in the Linguistics literature, and draw some conclusions for potential genre-specific implementations of coreference resolution.
CITATION STYLE
Aktaş, B., Scheffler, T., & Stede, M. (2019). Coreference in English OntoNotes: Properties and Genre Differences. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11697 LNAI, pp. 171–184). Springer Verlag. https://doi.org/10.1007/978-3-030-27947-9_15
Mendeley helps you to discover research relevant for your work.