Automatic Rule Induction for Unknown-Word Guessing

  • Mikheev A
  • 47


    Mendeley users who have this article in their library.
  • 66


    Citations of this article.


Words unknown to the lexicon present a substantial problem to NLP modules that rely on morphosyntactic information, such as part-of-speech taggers or syntactic parsers. In this paper we present a technique for fully automatic acquisition of rules that guess possible part-of-speech tags for unknown words using their starting and ending segments. The learning is performed from a general-purpose lexicon and word frequencies collected from a raw corpus. Three complimentary sets of word-guessing rules are statistically induced: prefix morphological rules, suffix morphological rules and ending-guessing rules. Using the proposed technique, unknown-word-guessing rule sets were induced and integrated into a stochastic tagger and a rule-based tagger, which were then applied to texts with unknown words.

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document

  • SCOPUS: 2-s2.0-0005496287
  • ISSN: 08912017
  • SGR: 0005496287
  • PUI: 127461227


  • Andrei Mikheev

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free