A challenge set and methods for noun-verb ambiguity

13Citations
Citations of this article
95Readers
Mendeley users who have this article in their library.

Abstract

English part-of-speech taggers regularly make egregious errors related to noun-verb ambiguity, despite having achieved 97%+ accuracy on the WSJ Penn Treebank since 2002. These mistakes have been difficult to quantify and make taggers less useful to downstream tasks such as translation and text-to-speech synthesis. This paper creates a new dataset of over 30,000 naturally-occurring non-trivial examples of noun-verb ambiguity. Taggers within 1% of each other when measured on the WSJ have accuracies ranging from 57% to 75% accuracy on this challenge set. Enhancing the strongest existing tagger with contextual word embeddings and targeted training data improves its accuracy to 89%, a 14% absolute (52% relative) improvement. Downstream, using just this enhanced tagger yields a 28% reduction in error over the prior best learned model for homograph disambiguation for text-to-speech synthesis.

Cite

CITATION STYLE

APA

Elkahky, A., Webster, K., Andor, D., & Pitler, E. (2018). A challenge set and methods for noun-verb ambiguity. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018 (pp. 2562–2572). Association for Computational Linguistics. https://doi.org/10.18653/v1/d18-1277

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free