Science as a public enterprise: the case for open data.
The past half-century has seen a breathtaking surge in scientific discovery. We have put satellites into orbit and probed the universe; we have discovered the chemical structure of living organisms and learned to manipulate it; we have been able to read the Earth's history in minute detail from ice-sheet and ocean cores; and we have improved human and animal health through increasingly large epidemiological studies and clinical trials. All of these achievements have necessitated new ways of collecting, storing, manipulating, and transmitting information that far surpass anything previously dreamed of. The information technology that permits this has not only removed geographical barriers but also put immense power to access, manipulate, and communicate information (including scientific data) in the hands of ordinary citizens. Yet many of the habits of scientists have barely changed since the 18th century. Driven by curiosity, they have typically pursued their research, published their findings, usually in peer-reviewed journals, filed their data, and then moved on. But such detachment is now questionable as science profoundly changes the lives of citizens, and scientists collect more and more evidence of the human assault on the natural systems of the planet. Science has become woven into the fabric of modern civilisation and should be, and be seen to be, a public enterprise, not a private enterprise done behind closed laboratory doors. Despite the spectacular advances of science, there is an emerging undercurrent of criticism about the accessibility of data on which scientists base their conclusions and on which policy or regulatory decisions are made. Scientists have tended to regard their data as personal property. After all, it is they who worked hard to generate it—and ownership has never been seriously challenged. But there are now strong reasons for a much more open attitude to data. First, technology has made computer code and large datasets more important to science and has opened up the prospect of sharing code and data at the click of a mouse. Second, there is public interest in making data available to other scientists to validate findings or re-use the data in new ways to advance knowledge.1, 2 Third, much modern science is created using public funds, which should oblige scientists to maximise the utility of their findings for the public good. And last, but by no means least, there are many competent members of the public who wish to test for themselves some of the pronouncements of scientists by analysing the data on which such pronouncements are based. Conventional peer-reviewed publications generally provide summaries of the available data, but not effective access to data in a useable format. Increasing calls for greater accessibility have not only come from peer reviewers and those who wish data to be more efficiently used,3, 4 but also from citizens who wish to interrogate scientific conclusions in depth.5 The latter in particular have often been frustrated by the apparent resistance of scientists to the release of data, and are increasingly making use of freedom of information laws to obtain it. Recent high-profile cases in the UK include the global temperature data sought from the University of East Anglia,6 which culminated in the so-called Climategate affair, and the tree-ring data series eventually obtained from Queen's University Belfast through the intervention of the Information Commissioner.7 Biomedical scientists have a mixed track record in relation to transparency and data-sharing. On the one hand the principles agreed at the genome-sequencing meeting8 in Bermuda, in 1997, paved the way for open data-sharing among the genome science community. On the other hand, persistent failures to place even summary results of clinical trials in the public domain have led to publication bias that seriously undermines those seeking to undertake systematic reviews.9, 10 The value of routinely sharing the results of clinical trials would be immense. The meta-analysis of the raw data from clinical trials of the effects of aspirin in the prevention of cardiovascular disease including—as it did—data from 95 000 patients is a fine example of the benefits of data-sharing.11 With the increasing use of electronic medical records, there is the opportunity for anonymised data from routine clinical use of drugs to provide high quality pharmacovigilance on a hitherto unprecedented scale. A regime of open access to scientific data does, however, pose problematic questions (panel). Because of the importance and timeliness of the issues, the UK's Royal Society has established a Working Group to explore them in depth and to make recommendations about how they might be addressed. The Working Group now seeks evidence from scientists and from the public.12 We badly need to get this right! Cost—benefit Making scientific data publicly available would be expensive. Who would (or should) pay? To what extent would this eat into the funding of primary research, and would it be compensated for by greater efficiency in the research system? Is the potential for misuse, misinterpretation, and the triggering of spurious findings from data a price worth paying for greater openness? Triggering and timing How and when should data release be triggered and how should data quality be assured? Should scientists be permitted to publish their own analysis and conclusions before releasing data to others? What about data that are never published? Whose data? Should open release only apply to those in receipt of grants from or employed by public funds? What about data from clinical trials, or data from safety analyses by private companies used to inform decisions of legitimate public interest (eg, Deep Water Horizon and Fukushima)? Confidentiality, privacy, security, and intellectual property How should we cope with the need for confidentiality, anonymisation, and data security? How would intellectual property rights be protected? How should we balance personal privacy against wider public benefit? International Unless scientists from all jurisdictions allowed their data to be shared, would data-sharing have any real traction or meaning? MR is the Chairman of the National Institute for Health and Clinical Excellence. PV is a Senior Vice President at GlaxoSmithKline. MW is the Director of the Wellcome Trust. GB has received travel support from the Royal Society. PV holds stock or stock options in GlaxoSmithKline, is a board member (no remuneration) of Tempero Pharmaceuticals, A*Star Board Singapore and NY Stem Cell Foundation. PV and MW are both board members of Genome Research Limited and Oschr. MR declares that he has no conflicts of interest.