Word segmentation: Quick but not dirty

  • Gambell T
  • Yang C
N/ACitations
Citations of this article
68Readers
Mendeley users who have this article in their library.

Abstract

When we listen to speech, we hear a sequence of words, but when we speak, we-do-not-separate-words-by-pauses. A first step to learn the words of a language, then, is to extract words from continuous speech. The current study presents a series of computational models that may shed light on the precise mechanisms of word segmentation. We shall begin with a brief review of the literature on word segmentation by enumerat- ing several well-supported strategies that the child may use to extract words. We note that, however, the underlying assumptions of some of these strategies are not always spelled out, and moreover, relative contributions of these strategies to the successful word segmentation remain somewhat obscure. And it is still an open question how such strategies, which are primarily established in the laboratory, would scale up in a realistic setting of language acquisition. The computational models in the present study aim to address these questions. Specifically, by using data from child-directed English speech, we demonstrate the inadequacies of several strategies for word segmentation. More positively, we demonstrate how some of these strategies can in fact lead to high quality segmentation results when complemented by linguistic constraints and/or additional learning mechanisms. We conclude with some general remarks on the interaction between experience-based learning and innate linguistic knowledge in language acquisition.

Cite

CITATION STYLE

APA

Gambell, T., & Yang, C. (2006). Word segmentation: Quick but not dirty. Unpublished Manuscript, 1–36. Retrieved from http://www.ling.upenn.edu/~ycharles/papers/quick.pdf

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free