Sign up & Download
Sign in

Good-Turing Frequency Estimation Without Tears.

by Geoffrey Sampson, William A Gale
Journal of Quantitative Linguistics (1995)

Abstract

Linguists and speech researchers who use statistical methods often need to estimate the frequency of some type of item in a population containing items of various types. A common approach is to divide the number of cases observed in a sample by the size of the sample; sometimes small positive quantities are added to divisor and dividend in order to avoid zero estimates for types missing from the sample. These approaches are obvious and simple, but they lack principled justification, and yield estimates that can be wildly inaccurate. I.J. Good and Alan Turing developed a family of theoretically well-founded techniques appropriate to this domain. Some versions of the GoodTuring approach are very demanding computationally, but we define a version, the Simple GoodTuring estimator, which is straightforward to use. Tested on a variety of natural-language-related data sets, the Simple GoodTuring estimator performs well, absolutely and relative both to the approaches just discussed and to other, more sophisticated techniques.

Cite this document (BETA)

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

16 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
19% Post Doc
 
13% Other Professional
 
13% Researcher (at a non-Academic Institution)
by Country
 
38% United States
 
13% United Kingdom
 
6% Switzerland