Big data and large sample size: A cautionary note on the potential for bias

274Citations
Citations of this article
529Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

A number of commentaries have suggested that large studies are more reliable than smaller studies and there is a growing interest in the analysis of "big data" that integrates information from many thousands of persons and/or different data sources. We consider a variety of biases that are likely in the era of big data, including sampling error, measurement error, multiple comparisons errors, aggregation error, and errors associated with the systematic exclusion of information. Using examples from epidemiology, health services research, studies on determinants of health, and clinical trials, we conclude that it is necessary to exercise greater caution to be sure that big sample size does not lead to big inferential errors. Despite the advantages of big studies, large sample size can magnify the bias associated with error resulting from sampling or study design. © 2014 Wiley Periodicals, Inc.

Author supplied keywords

Cite

CITATION STYLE

APA

Kaplan, R. M., Chambers, D. A., & Glasgow, R. E. (2014). Big data and large sample size: A cautionary note on the potential for bias. Clinical and Translational Science. Blackwell Publishing Ltd. https://doi.org/10.1111/cts.12178

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free