Truecasing

77Citations
Citations of this article
158Readers
Mendeley users who have this article in their library.

Abstract

Truecasing is the process of restoring case information to badly-cased or non-cased text. This paper explores truecasing issues and proposes a statistical, language modeling based truecaser which achieves an accuracy of ∼98% on news articles. Task based evaluation shows a 26% F-measure improvement in named entity recognition when using truecasing. In the context of automatic content extraction, mention detection on automatic speech recognition text is also improved by a factor of 8. Truecasing also enhances machine translation output legibility and yields a BLEU score improvement of 80.2%. This paper argues for the use of truecasing as a valuable component in text processing applications.

Cite

CITATION STYLE

APA

Lita, L. V., Ittycheriah, A., Roukos, S., & Kambhatla, N. (2003). Truecasing. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 2003-July). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1075096.1075116

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free