Contextual dependencies in unsupervised word segmentation

Sharon Goldwater; Thomas L. Griffiths; Mark Johnson

Conference Proceedings

Contextual dependencies in unsupervised word segmentation

COLING/ACL 2006 - 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (2006) 1 673-680

DOI: 10.3115/1220175.1220260

155Citations

224Readers

Get full text

Abstract

Developing better methods for segmenting continuous text into words is important for improving the processing of Asian languages, and may shed light on how humans learn to segment speech. We propose two new Bayesian word segmentation methods that assume unigram and bigram models of word dependencies respectively. The bigram model greatly outperforms the unigram model (and previous probabilistic models), demonstrating the importance of such dependencies for word segmentation. We also show that previous probabilistic models rely crucially on suboptimal search procedures. © 2006 Association for Computational Linguistics.

Cite

CITATION STYLE

APA

Goldwater, S., Griffiths, T. L., & Johnson, M. (2006). Contextual dependencies in unsupervised word segmentation. In COLING/ACL 2006 - 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Vol. 1, pp. 673–680). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1220175.1220260

Contextual dependencies in unsupervised word segmentation

Abstract

Cite

Register to see more suggestions