This paper presents results for a maximum-entropy-based part of speech tagger, which achieves superior performance principally by enriching the information sources used for tagging. In particular, we get improved results by incorporating these features: (i) more extensive treatment of capitalization for unknown words; (ii) features for the disambiguation of the tense forms of verbs; (iii) features for disambiguating particles from prepositions and adverbs. The best resulting accuracy for the tagger on the Penn Treebank is 96.86% overall, and 86.91% on previously unseen words.
CITATION STYLE
Toutanova, K., & Manning, C. D. (2000). Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger. In Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, SIGDAT-EMNLP 2000 - Held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics, ACL 2000 (pp. 63–70). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1117794.1117802
Mendeley helps you to discover research relevant for your work.