Abstract
This paper describes a variety of nonparametric Bayesian models of word segmentation based on AdaptorGrammars that model different aspects of the input and incorporate different kinds of prior knowledge, and applies them to the Bantu language Sesotho. While we find overall word segmentation accuracies lower than these models achieve on English, we also find some interesting differences in which factors contribute to better word segmentation. Specifically, we found little improvement to word segmentation accuracy when we modeled contextual dependencies, while modeling morphological structure did improve segmentation accuracy.
Cite
CITATION STYLE
Johnson, M. (2008). Unsupervised word segmentation for sesotho using adaptor grammars. In SIGMORPHON 2008 - 10th Meeting of the ACL Special Interest Group on Computational Morphology and Phonology, Proceedings of the Workshop (pp. 20–27). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1626324.1626328
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.