Unsupervised models for morpheme segmentation and morphology learning

  • Creutz M
  • Lagus K
  • 94


    Mendeley users who have this article in their library.
  • 5


    Citations of this article.


We present a model family called Morfessor for the unsupervised\r
induction of a simple morphology from raw text data. The model is\r
formulated in a probabilistic maximum a posteriori framework.\r
Morfessor can handle highly-inflecting and compounding languages,\r
where words can consist of lengthy sequences of morphemes. A lexicon\r
of word segments, so called morphs, is induced from the data.\r
The lexicon stores information about both the usage and form\r
of the morphs. Several instances of the model are evaluated\r
quantitatively in a morpheme segmentation task on different sized sets\r
of Finnish as well as English data. Morfessor is shown to perform very\r
well compared to a widely known benchmark algorithm, in particular on\r
Finnish data.

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document


  • Mathias Creutz

  • Krista Lagus

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free