Automatic Rule Induction for Unknown-Word Guessing

Andrei Mikheev

Journal Article

Automatic Rule Induction for Unknown-Word Guessing

Mikheev A

Computational Linguistics (1997) 23(3) 405-423

ISSN: 08912017

100Citations

117Readers

Abstract

Words unknown to the lexicon present a substantial problem to NLP modules that rely on morphosyntactic information, such as part-of-speech taggers or syntactic parsers. In this paper we present a technique for fully automatic acquisition of rules that guess possible part-of-speech tags for unknown words using their starting and ending segments. The learning is performed from a general-purpose lexicon and word frequencies collected from a raw corpus. Three complimentary sets of word-guessing rules are statistically induced: prefix morphological rules, suffix morphological rules and ending-guessing rules. Using the proposed technique, unknown-word-guessing rule sets were induced and integrated into a stochastic tagger and a rule-based tagger, which were then applied to texts with unknown words.

Cite

CITATION STYLE

APA

Mikheev, A. (1997). Automatic Rule Induction for Unknown-Word Guessing. Computational Linguistics, 23(3), 405–423.

Automatic Rule Induction for Unknown-Word Guessing

Abstract

Cite

Register to see more suggestions