Truecasing

Lucian Vlad Lita; Abe Ittycheriah; Salim Roukos; Nanda Kambhatla

Conference ProceedingsOPEN ACCESS

Truecasing

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2003) 2003-July

DOI: 10.3115/1075096.1075116

77Citations

158Readers

Abstract

Truecasing is the process of restoring case information to badly-cased or non-cased text. This paper explores truecasing issues and proposes a statistical, language modeling based truecaser which achieves an accuracy of ∼98% on news articles. Task based evaluation shows a 26% F-measure improvement in named entity recognition when using truecasing. In the context of automatic content extraction, mention detection on automatic speech recognition text is also improved by a factor of 8. Truecasing also enhances machine translation output legibility and yields a BLEU score improvement of 80.2%. This paper argues for the use of truecasing as a valuable component in text processing applications.

Cite

CITATION STYLE

APA

Lita, L. V., Ittycheriah, A., Roukos, S., & Kambhatla, N. (2003). Truecasing. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 2003-July). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1075096.1075116

Truecasing

Abstract

Cite

Register to see more suggestions