Sign up & Download
Sign in

A Maximum Entropy Model of Phonotactics and Phonotactic Learning

by Bruce Hayes, Colin Wilson
Linguistic Inquiry (2008)

Abstract

The study of phonotactics is a central topic in phonology. We propose a theory of phonotactic grammars and a learning algorithm that con- structs such grammars from positive evidence. Our grammars consist of constraints that are

Cite this document (BETA)

Available from www.mitpressjournals.org
Page 1
hidden

A Maximum Entropy Model of Phonotactics and Phonotactic Learning


Page 2
hidden
A Maximum Entropy Model of
Phonotactics and Phonotactic
Learning
Bruce Hayes
Colin Wilson
The study of phonotactics is a central topic in phonology. We propose
a theory of phonotactic grammars and a learning algorithm that con-
structs such grammars from positive evidence. Our grammars consist
of constraints that are assigned numerical weights according to the
principle of maximum entropy. The grammars assess possible words
on the basis of the weighted sum of their constraint violations. The
learning algorithm yields grammars that can capture both categorical
and gradient phonotactic patterns. The algorithm is not provided with
constraints in advance, but uses its own resources to form constraints
and weight them. A baseline model, in which Universal Grammar is
reduced to a feature set and an SPE-style constraint format, suffices
to learn many phonotactic phenomena. In order for the model to learn
nonlocal phenomena such as stress and vowel harmony, it must be
augmented with autosegmental tiers and metrical grids. Our results
thus offer novel, learning-theoretic support for such representations.
We apply the model in a variety of learning simulations, showing
that the learned grammars capture the distributional generalizations of
these languages and accurately predict the findings of a phonotactic
experiment.
Keywords: phonotactics, maximum entropy, learnability, onsets,
Shona, Wargamay
1 Introduction
In one of the central articles from the early history of generative phonology, Chomsky and Halle
(1965) lay out a research program for the theory of phonotactics. They begin with the observation
that the logically possible sequences of English phonemes can be divided into three categories:
(1) a. Existing words, such as brick;
b. Nonexisting words that are judged by native speakers to be well formed, such as
blick; and
c. Nonexisting words that are judged by native speakers to be ill formed, such as bnick.
We would like to thank two anonymous LI reviewers, Steven Abney, Paul Boersma, Michael Hammond, Robert
Kirchner, Robert Malouf, Joe Pater, Donca Steriade, Kie Zuraw, and audiences at the University of Michigan, the University
of California at San Diego, the University of Arizona, and UCLA for helpful input on our project. Special thanks to
Jason Eisner for alerting us to the feasibility of using finite state machines to formalize the computations of our model.
379
Linguistic Inquiry, Volume 39, Number 3, Summer 2008
379–440
 2008 by the Massachusetts Institute of Technology

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

35 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
43% Ph.D. Student
 
14% Post Doc
 
11% Assistant Professor
by Country
 
63% United States
 
6% France
 
6% Canada

Groups

allbib 1/2010