Unsupervised Models for Morpheme Segmentation and Morphology Learning

Mathias Creutz; Krista Lagus

Journal Article

Unsupervised Models for Morpheme Segmentation and Morphology Learning

ACM Transactions on Speech and Language Processing (2007) 4(1) 1-34

DOI: 10.1145/1187415.1187418

58Citations

82Readers

Get full text

Abstract

We present a model family called Morfessor for the unsupervised induction of a simple morphology from raw text data. The model is formulated in a probabilistic maximum a posteriori framework. Morfessor can handle highly inflecting and compounding languages where words can consist of lengthy sequences of morphemes. A lexicon of word segments, called morphs, is induced from the data. The lexicon stores information about both the usage and form of the morphs. Several instances of the model are evaluated quantitatively in a morpheme segmentation task on different sized sets of Finnish as well as English data. © 2009, ACM. All rights reserved.

Author supplied keywords

Cite

CITATION STYLE

APA

Creutz, M., & Lagus, K. (2007). Unsupervised Models for Morpheme Segmentation and Morphology Learning. ACM Transactions on Speech and Language Processing, 4(1), 1–34. https://doi.org/10.1145/1187415.1187418

Unsupervised Models for Morpheme Segmentation and Morphology Learning

Abstract

Author supplied keywords

Cite

Register to see more suggestions