Adaptive sentence boundary disambiguation

40Citations
Citations of this article
114Readers
Mendeley users who have this article in their library.

Abstract

Labeling of sentence boundaries is a necessary prerequisite for many natural language processing tasks, including part-of-speech tagging and sentence alignment. End-of-sentence punctuation marks are ambiguous; to disambiguate them most systems use brittle, special-purpose regular expression grammars and exception rules. As an alternative, we have developed an efficient, trainable algorithm that uses a lexicon with part-of-speech probabilities and a feed-forward neural network. This work demonstrates the feasibility of using prior probabilities of part-of-speech assignments, as opposed to words or definite part-of-speech assignments, as contextual information. After training for less than one minute, the method correctly labels over 98.5% of sentence boundaries in a corpus of over 27,000 sentence-boundary marks. We show the method to be efficient and easily adaptable to different text genres, including single-case texts.

Cite

CITATION STYLE

APA

Palmer, D. D., & Hearst, M. A. (1994). Adaptive sentence boundary disambiguation. In 4th Conference on Applied Natural Language Processing, ANLP 1994 - Proceedings (pp. 78–83). Association for Computational Linguistics (ACL). https://doi.org/10.3115/974358.974376

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free