We present a trainable model for identifying sentence boundaries in raw text. Given a corpus annotated with sentence boundaries, our model learns to classify each occurrence of., ?, and / as either a valid or invalid sentence boundary. The training procedure requires no hand-crafted rules, lexica, part-of-speech tags, or domain-specific information. The model can therefore be trained easily on any genre of English, and should be trainable on any other Roman-alphabet language. Performance is comparable to or better than the performance of similar systems, but we emphasize the simplicity of retraining for new domains.
CITATION STYLE
Reynar, J. C., & Ratnaparkhi, A. (1997). A maximum entropy approach to identifying sentence boundaries. In 5th Conference on Applied Natural Language Processing, ANLP 1997 - Proceedings (pp. 16–19). Association for Computational Linguistics (ACL). https://doi.org/10.3115/974557.974561
Mendeley helps you to discover research relevant for your work.