Better estimates from binned income data: Interpolated CDFs and mean-matching

33Citations
Citations of this article
47Readers
Mendeley users who have this article in their library.

Abstract

Researchers often estimate income statistics from summaries that report the number of incomes in bins such as $0 to 10,000, $10,001 to 20,000, . . . , $200,000+. Some analysts assign incomes to bin midpoints, but this treats income as discrete. Other analysts fit a continuous parametric distribution, but the distribution may not fit well. We fit nonparametric continuous distributions that reproduce the bin counts perfectly by interpolating the cumulative distribution function (CDF). We also show how both midpoints and interpolated CDFs can be constrained to reproduce the mean of income when it is known. We evaluate the methods in estimating the Gini coefficients of all 3,221 U.S. counties. Fitting parametric distributions is very slow. Fitting interpolated CDFs is much faster and slightly more accurate. Both interpolated CDFs and midpoints give dramatically better estimates if constrained to match a known mean. We have implemented interpolated CDFs in the "binsmooth" package for R. We have implemented the midpoint method in the "rpme" command for Stata. Both implementations can be constrained to match a known mean.

Cite

CITATION STYLE

APA

von Hippel, P. T., Hunter, D. J., & Drownb, M. (2017). Better estimates from binned income data: Interpolated CDFs and mean-matching. Sociological Science, 4, 641–655. https://doi.org/10.15195/v4.a26

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free