Mining for unambiguous instances to adapt part-of-speech taggers to new domains

Dirk Hovy; Barbara Plank; Héctor Martínez Alonso; Anders Søgaard

Conference Proceedings

Mining for unambiguous instances to adapt part-of-speech taggers to new domains

NAACL HLT 2015 - 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference (2015) 1256-1261

DOI: 10.3115/v1/n15-1135

9Citations

85Readers

Get full text

Abstract

We present a simple, yet effective approach to adapt part-of-speech (POS) taggers to new domains. Our approach only requires a dictionary and large amounts of unlabeled target data. The idea is to use the dictionary to mine the unlabeled target data for unambiguous word sequences, thus effectively collecting labeled target data. We add the mined instances to available labeled newswire data to train a POS tagger for the target domain. The induced models significantly improve tagging accuracy on held-out test sets across three domains (Twitter, spoken language, and search queries). We also present results for Dutch, Spanish and Portuguese Twitter data, and provide two novel manually-annotated test sets.

Cite

CITATION STYLE

APA

Hovy, D., Plank, B., Alonso, H. M., & Søgaard, A. (2015). Mining for unambiguous instances to adapt part-of-speech taggers to new domains. In NAACL HLT 2015 - 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference (pp. 1256–1261). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/n15-1135

Mining for unambiguous instances to adapt part-of-speech taggers to new domains

Abstract

Cite

Register to see more suggestions