Automatic Rule Induction for Unknown-Word Guessing

ISSN: 08912017
100Citations
Citations of this article
117Readers
Mendeley users who have this article in their library.

Abstract

Words unknown to the lexicon present a substantial problem to NLP modules that rely on morphosyntactic information, such as part-of-speech taggers or syntactic parsers. In this paper we present a technique for fully automatic acquisition of rules that guess possible part-of-speech tags for unknown words using their starting and ending segments. The learning is performed from a general-purpose lexicon and word frequencies collected from a raw corpus. Three complimentary sets of word-guessing rules are statistically induced: prefix morphological rules, suffix morphological rules and ending-guessing rules. Using the proposed technique, unknown-word-guessing rule sets were induced and integrated into a stochastic tagger and a rule-based tagger, which were then applied to texts with unknown words.

Cite

CITATION STYLE

APA

Mikheev, A. (1997). Automatic Rule Induction for Unknown-Word Guessing. Computational Linguistics, 23(3), 405–423.

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free