Abstract
We present a simple, yet effective approach to adapt part-of-speech (POS) taggers to new domains. Our approach only requires a dictionary and large amounts of unlabeled target data. The idea is to use the dictionary to mine the unlabeled target data for unambiguous word sequences, thus effectively collecting labeled target data. We add the mined instances to available labeled newswire data to train a POS tagger for the target domain. The induced models significantly improve tagging accuracy on held-out test sets across three domains (Twitter, spoken language, and search queries). We also present results for Dutch, Spanish and Portuguese Twitter data, and provide two novel manually-annotated test sets.
Cite
CITATION STYLE
Hovy, D., Plank, B., Alonso, H. M., & Søgaard, A. (2015). Mining for unambiguous instances to adapt part-of-speech taggers to new domains. In NAACL HLT 2015 - 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference (pp. 1256–1261). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/n15-1135
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.