A maximum entropy approach to identifying sentence boundaries

245Citations
Citations of this article
209Readers
Mendeley users who have this article in their library.

Abstract

We present a trainable model for identifying sentence boundaries in raw text. Given a corpus annotated with sentence boundaries, our model learns to classify each occurrence of., ?, and / as either a valid or invalid sentence boundary. The training procedure requires no hand-crafted rules, lexica, part-of-speech tags, or domain-specific information. The model can therefore be trained easily on any genre of English, and should be trainable on any other Roman-alphabet language. Performance is comparable to or better than the performance of similar systems, but we emphasize the simplicity of retraining for new domains.

Cite

CITATION STYLE

APA

Reynar, J. C., & Ratnaparkhi, A. (1997). A maximum entropy approach to identifying sentence boundaries. In 5th Conference on Applied Natural Language Processing, ANLP 1997 - Proceedings (pp. 16–19). Association for Computational Linguistics (ACL). https://doi.org/10.3115/974557.974561

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free