Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger

742Citations
Citations of this article
386Readers
Mendeley users who have this article in their library.

Abstract

This paper presents results for a maximum-entropy-based part of speech tagger, which achieves superior performance principally by enriching the information sources used for tagging. In particular, we get improved results by incorporating these features: (i) more extensive treatment of capitalization for unknown words; (ii) features for the disambiguation of the tense forms of verbs; (iii) features for disambiguating particles from prepositions and adverbs. The best resulting accuracy for the tagger on the Penn Treebank is 96.86% overall, and 86.91% on previously unseen words.

Cite

CITATION STYLE

APA

Toutanova, K., & Manning, C. D. (2000). Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger. In Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, SIGDAT-EMNLP 2000 - Held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics, ACL 2000 (pp. 63–70). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1117794.1117802

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free