This paper introduces a robust, portable system for categorizing unknown words. It is based on a multi- component architecture where each component is responsible for identifying one class of unknown words. The focus of this paper is the component that identifies spelling errors. The misspelling identifier uses a decision tree architecture to combine multiple types of evidence about the unknown word. The misspelling identifier is evaluated using data from live closed captions - a genre replete with a wide variety of unknown words.
CITATION STYLE
Toole, J. (1999). Categorizing unknown words: A decision tree-based misspelling identifier. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1747, pp. 122–133). Springer Verlag. https://doi.org/10.1007/3-540-46695-9_11
Mendeley helps you to discover research relevant for your work.