Using unsupervised paradigm acquisition for prefixes

Daniel Zeman

Conference Proceedings

Using unsupervised paradigm acquisition for prefixes

Zeman D

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2009) 5706 LNCS 983-990

DOI: 10.1007/978-3-642-04447-2_130

5Citations

6Readers

Get full text

Abstract

We describe a simple method of unsupervised morpheme segmentation of words in an unknown language. All that is needed is a raw text corpus (or a list of words) in the given language. The algorithm identifies word parts occurring in many words and interprets them as morpheme candidates (prefixes, stems and suffixes). New treatment of prefixes is the main innovation in comparison to [1]. After filtering out spurious hypotheses, the list of morphemes is applied to segment input words. Official Morpho Challenge 2008 evaluation is given together with some additional experiments. Processing of prefixes improved the F-score by 5 to 11 points for German, Finnish and Turkish, while it failed to improve English and Arabic. We also analyze and discuss errors with respect to the evaluation method. © 2009 Springer Berlin Heidelberg.

Cite

CITATION STYLE

APA

Zeman, D. (2009). Using unsupervised paradigm acquisition for prefixes. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5706 LNCS, pp. 983–990). https://doi.org/10.1007/978-3-642-04447-2_130

Using unsupervised paradigm acquisition for prefixes

Abstract

Cite

Register to see more suggestions