IMPORTANCE The use and misuse of P values has generated extensive debates. OBJECTIVE To evaluate in large scale the P values reported in the abstracts and full text of biomedical research articles over the past 25 years and determine how frequently statistical information is presented in ways other than P values. DESIGN Automated text-mining analysis was performed to extract data on P values reported in 12 821 790 MEDLINE abstracts and in 843 884 abstracts and full-text articles in PubMed Central (PMC) from 1990 to 2015. Reporting of P values in 151 English-language core clinical journals and specific article types as classified by PubMed also was evaluated. A random sample of 1000 MEDLINE abstracts was manually assessed for reporting of P values and other types of statistical information; of those abstracts reporting empirical data, 100 articles were also assessed in full text. MAIN OUTCOMES AND MEASURES P values reported. RESULTS Text mining identified 4 572 043 P values in 1 608 736 MEDLINE abstracts and 3 438 299 P values in 385 393 PMC full-text articles. Reporting of P values in abstracts increased from 7.3%in 1990 to 15.6%in 2014. In 2014, P values were reported in 33.0%of abstracts from the 151 core clinical journals (n = 29 725 abstracts), 35.7%ofmeta-analyses (n = 5620), 38.9%of clinical trials (n = 4624), 54.8%of randomized controlled trials (n = 13 544), and 2.4%of reviews (n = 71 529). The distribution of reported P values in abstracts and in full text showed strong clustering at P values of .05 and of .001 or smaller. Over time, the "best" (most statistically significant) reported P values were modestly smaller and the "worst" (least statistically significant) reported P values became modestly less significant. Among the MEDLINE abstracts and PMC full-text articles with P values, 96% reported at least 1 P value of .05 or lower, with the proportion remaining steady over time in PMC full-text articles. In 1000 abstracts that were manually reviewed, 796 were from articles reporting empirical data; P values were reported in 15.7%(125/796 [95%CI, 13.2%-18.4%]) of abstracts, confidence intervals in 2.3%(18/796 [95%CI, 1.3%-3.6%]), Bayes factors in 0% (0/796 [95%CI, 0%-0.5%]), effect sizes in 13.9%(111/796 [95%CI, 11.6%-16.5%]), other information that could lead to estimation of P values in 12.4%(99/796 [95%CI, 10.2%-14.9%]), and qualitative statements about significance in 18.1%(181/1000 [95%CI, 15.8%-20.6%]); only 1.8%(14/796 [95%CI, 1.0%-2.9%]) of abstracts reported at least 1 effect size and at least 1 confidence interval. Among 99 manually extracted full-text articles with data, 55 reported P values, 4 presented confidence intervals for all reported effect sizes, none used Bayesian methods, 1 used false-discovery rates, 3 used sample size/power calculations, and 5 specified the primary outcome. CONCLUSIONS AND RELEVANCE In this analysis of P values reported in MEDLINE abstracts and in PMC articles from 1990-2015, more MEDLINE abstracts and articles reported P values over time, almost all abstracts and articles with P values reported statistically significant results, and, in a subgroup analysis, few articles included confidence intervals, Bayes factors, or effect sizes. Rather than reporting isolated P values, articles should include effect sizes and uncertaintymetrics.
Mendeley helps you to discover research relevant for your work.
CITATION STYLE
Chavalarias, D., Wallach, J. D., Li, A. H. T., & Ioannidis, J. P. A. (2016). Evolution of reporting P values in the biomedical literature, 1990-2015. JAMA - Journal of the American Medical Association, 315(11), 1141–1148. https://doi.org/10.1001/jama.2016.1952