Unsupervised morphological segmentation and clustering with document boundaries

7Citations
Citations of this article
86Readers
Mendeley users who have this article in their library.

Abstract

Many approaches to unsupervised morphology acquisition incorporate the frequency of character sequences with respect to each other to identify word stems and affixes. This typically involves heuristic search procedures and calibrating multiple arbitrary thresholds. We present a simple approach that uses no thresholds other than those involved in standard application of χ2 significance testing. A key part of our approach is using document boundaries to constrain generation of candidate stems and affixes and clustering morphological variants of a given word stem. We evaluate our model on English and the Mayan language Uspanteko; it compares favorably to two benchmark systems which use considerably more complex strategies and rely more on experimentally chosen threshold values. © 2009 ACL and AFNLP.

Cite

CITATION STYLE

APA

Moon, T., Erk, K., & Baldridge, J. (2009). Unsupervised morphological segmentation and clustering with document boundaries. In EMNLP 2009 - Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: A Meeting of SIGDAT, a Special Interest Group of ACL, Held in Conjunction with ACL-IJCNLP 2009 (pp. 668–677). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1699571.1699600

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free