Unsupervised morphological segmentation and clustering with document boundaries

Taesun Moon; Katrin Erk; Jason Baldridge

Conference ProceedingsOPEN ACCESS

Unsupervised morphological segmentation and clustering with document boundaries

EMNLP 2009 - Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: A Meeting of SIGDAT, a Special Interest Group of ACL, Held in Conjunction with ACL-IJCNLP 2009 (2009) 668-677

DOI: 10.3115/1699571.1699600

6Citations

81Readers

Abstract

Many approaches to unsupervised morphology acquisition incorporate the frequency of character sequences with respect to each other to identify word stems and affixes. This typically involves heuristic search procedures and calibrating multiple arbitrary thresholds. We present a simple approach that uses no thresholds other than those involved in standard application of χ2 significance testing. A key part of our approach is using document boundaries to constrain generation of candidate stems and affixes and clustering morphological variants of a given word stem. We evaluate our model on English and the Mayan language Uspanteko; it compares favorably to two benchmark systems which use considerably more complex strategies and rely more on experimentally chosen threshold values. © 2009 ACL and AFNLP.

Cite

CITATION STYLE

APA

Moon, T., Erk, K., & Baldridge, J. (2009). Unsupervised morphological segmentation and clustering with document boundaries. In EMNLP 2009 - Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: A Meeting of SIGDAT, a Special Interest Group of ACL, Held in Conjunction with ACL-IJCNLP 2009 (pp. 668–677). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1699571.1699600

Unsupervised morphological segmentation and clustering with document boundaries

Abstract

Cite

Register to see more suggestions