Periods, capitalized words, etc

58Citations
Citations of this article
125Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this article we present an approach for tackling three important aspects of text normalization: sentence boundary disambiguation, disambiguation of capitalized words in positions where capitalization is expected, and identification of abbreviations. As opposed to the two dominant techniques of computing statistics or writing specialized grammars, our document-centered approach works by considering suggestive local contexts and repetitions of individual words within a document. This approach proved to be robust to domain shifts and new lexica and produced performance on the level with the highest reported results. When incorporated into a part-of-speech tagger, it helped reduce the error rate significantly on capitalized words and sentence boundaries. We also investigated the portability to other languages and obtained encouraging results.

Cite

CITATION STYLE

APA

Mikheev, A. (2002). Periods, capitalized words, etc. Computational Linguistics, 28(3), 289–318. https://doi.org/10.1162/089120102760275992

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free