Improving a simple bigram HMM part-of-speech tagger by latent annotation and self-training

Zhongqiang Huang; Vladimir Eidelman; Mary Harper

Conference Proceedings

Improving a simple bigram HMM part-of-speech tagger by latent annotation and self-training

NAACL-HLT 2009 - Human Language Technologies: 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Short Papers (2009) 213-216

DOI: 10.3115/1620853.1620911

24Citations

94Readers

Get full text

Abstract

In this paper, we describe and evaluate a bigram part-of-speech (POS) tagger that uses latent annotations and then investigate using additional genre-matched unlabeled data for self-training the tagger. The use of latent annotations substantially improves the performance of a baseline HMM bigram tagger, outperforming a trigram HMM tagger with sophisticated smoothing. The performance of the latent tagger is further enhanced by self-training with a large set of unlabeled data, even in situations where standard bigram or trigram taggers do not benefit from self-training when trained on greater amounts of labeled training data. Our best model obtains a state-of-the-art Chinese tagging accuracy of 94.78% when evaluated on a representative test set of the Penn Chinese Treebank 6.0.

Cite

CITATION STYLE

APA

Huang, Z., Eidelman, V., & Harper, M. (2009). Improving a simple bigram HMM part-of-speech tagger by latent annotation and self-training. In NAACL-HLT 2009 - Human Language Technologies: 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Short Papers (pp. 213–216). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1620853.1620911

Improving a simple bigram HMM part-of-speech tagger by latent annotation and self-training

Abstract

Cite

Register to see more suggestions