Data Sharing and Inductive Learning — Toward Healthy Birth, Growth, and Development

  • Jumbe N
  • Murray J
  • Kern S
14Citations
Citations of this article
44Readers
Mendeley users who have this article in their library.

Abstract

Perspective: The potential health benefits from sharing participant-level clinical research data for the purpose of secondary analysis or meta-analysis have been widely touted. Although some researchers remain wary about sharing data, recent policies and proposals by funders, scientific journals, research institutions, and international health organizations mean that data sharing, in one form or another, is inevitable. Now is therefore the time to focus on developing practices for data sharing that are effective, efficient, equitable, and ethical. In the process, we may need to question the assumption that more is better. Simply making more data openly available may not lead to analyses that are relevant and that are actually applied to improve health. A variety of data-sharing platform models have evolved to meet the needs of various communities. As more partners in science mandate sharing of data, these platforms and repositories are likely to grow rapidly in number and size. But they will also need to evolve to avoid perils that could undermine the benefits of data sharing. One of the risks posed by these expanding repositories is the production of “data dumpsters”: repositories of data without the metadata, data dictionaries, or documentation needed for meaningful or correct reanalysis. Fulfilling an obligation to share data before good practices in data formatting and documentation have been established and replicated may allow researchers to check the “data shared” box, but it may also result in an epidemic of accessible data of limited usefulness. There is currently inadequate funding and expertise for curating data to a standard and quality suitable for external secondary use; researchers must bear the costs themselves or opt, as many currently do, to make raw data available without the explanatory documentation necessary to make them useful. Most repositories are not equipped to rectify this problem — nor do they see this function as part of their mandate. Another concern is the risk of widening the research-output gap between low-resource and high-resource countries. Analysts in rich countries have the skills and resources to use and reanalyze data collected in lower-income countries, whereas the reverse is rarely true. When medical journals mandate data sharing, researchers in low-income countries will have no choice but to allow external access to those who are better equipped to make use of the data. But better equipped does not mean better qualified: if there’s no requirement to involve primary researchers when conducting secondary analyses, misinterpretation of the data is possible — indeed it is likely, especially in the case of data sets for which high-quality data management and descriptors are lacking. Reuse of data that produces incorrect results does not improve health outcomes. More investment is needed in platforms that can standardize, clean, and curate data into the usable formats that are required for sharing data effectively. Those systems will also have to ensure ethical and responsible data sharing that maximizes the use of available data. In global health, that means encouraging engagement from researchers around the world and ensuring appropriate acknowledgment of the data generators.

Cite

CITATION STYLE

APA

Jumbe, N. L., Murray, J. C., & Kern, S. (2016). Data Sharing and Inductive Learning — Toward Healthy Birth, Growth, and Development. New England Journal of Medicine, 374(25), 2415–2417. https://doi.org/10.1056/nejmp1605441

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free