Mining for unambiguous instances to adapt part-of-speech taggers to new domains

9Citations
Citations of this article
85Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We present a simple, yet effective approach to adapt part-of-speech (POS) taggers to new domains. Our approach only requires a dictionary and large amounts of unlabeled target data. The idea is to use the dictionary to mine the unlabeled target data for unambiguous word sequences, thus effectively collecting labeled target data. We add the mined instances to available labeled newswire data to train a POS tagger for the target domain. The induced models significantly improve tagging accuracy on held-out test sets across three domains (Twitter, spoken language, and search queries). We also present results for Dutch, Spanish and Portuguese Twitter data, and provide two novel manually-annotated test sets.

Cite

CITATION STYLE

APA

Hovy, D., Plank, B., Alonso, H. M., & Søgaard, A. (2015). Mining for unambiguous instances to adapt part-of-speech taggers to new domains. In NAACL HLT 2015 - 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference (pp. 1256–1261). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/n15-1135

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free