Sign up & Download
Sign in

Envisioning a biomedical data reuse registry.

by Heather A Piwowar, Wendy W Chapman
AMIA Annual Symposium proceedings AMIA Symposium AMIA Symposium (2008)

Abstract

Repurposing research data holds many benefits for the advancement of biomedicine, yet is very difficult to measure and evaluate. We propose a data reuse registry to maintain links between primary research datasets and studies that reuse this data. Such a resource could help recognize investigators whose work is reused, illuminate aspects of reusability, and evaluate policies designed to encourage data sharing and reuse.

Cite this document (BETA)

Available from Heather Piwowar's profile on Mendeley.
Page 1
hidden

Envisioning a biomedical data reuse registry.

Envisioning a Data Reuse Registry
Heather A. Piwowar and Wendy W. Chapman
University of Pittsburgh Department of Biomedical Informatics
Repurposing research data holds many benefits for the advancement of
biomedicine, yet is very difficult to measure and evaluate. We propose a data
reuse registry to maintain links between primary research datasets and studies
that reuse this data. Such a resource could help recognize investigators whose
work is reused, illuminate aspects of reusability, and evaluate policies designed
to encourage data sharing and reuse.
Motivation
The full benefits of data sharing will only be realized when we can incent
investigators to share their data[1] and quantify the value created by data
reuse.[2] Current practices for recognizing the provenance of reused data include
an acknowledgment, a listing of accession numbers, a database search strategy,
and sometimes a citation within the article. These mechanisms make it very
difficult to identify and tabulate reuse, and thus to reward and encourage data
sharing. We propose a solution: a Data Reuse Registry.
What is a data reuse registry?
We define a Data Reuse Registry (DRR) as a database with links between
biomedical research studies and the datasets used within the studies. The reuse
articles may be represented as PubMed IDs, and the datasets as accession
numbers within established databases or the PubMed IDs of the studies that
originated the data.
How would the DRR be used?
Information from the DRR could be used to recognize investigators whose work
is reused, illuminate aspects of reusability, examine the variety of purposes for
which a given dataset is reused, and evaluate policies designed to encourage
data sharing and reuse.
How would the DRR be populated?
We anticipate several mechanisms for populating the DRR:
* Voluntary submissions
* Automatic detection from the literature[3]
* Prospective submission of reuse plans, followed by automatic tracking
We envision collecting prospective citations in two steps. First, prior to publication,
investigators visit a web page and list datasets and accession numbers reused in their
research, thereby creating a DRR entry record in the DRR database. In return, the reusing
investigators will be given some best-practices free-text language that they can insert into
their acknowledgments section, a list of references to the papers that originated the data,
some value-add information such as links to other studies that previously reused this data,
and a reference to a new DRR entry record. When authors cite this DRR within their reuse
study as part of their data use acknowledgement, the second step of DRR data input can
be done automatically: citations in the published literature will be mined periodically to
discover citations to DRR entries. These citations will be combined with the information
provided when the entry was created to explicitly link published papers with the datasets
they reused. The result will be searchable by anyone wishing to understand the reuse
impact made by an investigator, institution, or database.
Conclusion
While the DRR may not be a comprehensive solution, we believe it represents a starting
place for finding solutions to the important problem of evaluating, encouraging, and
rewarding data sharing and reuse.
References
1. Compete, collaborate, compel. Nat Genet. 2007;39(8).
2. Ball CA, Sherlock G, Brazma A. Funding high-throughput data sharing. Nat Biotechnol. 2004 Sep;22(9).
3. Piwowar HA, Chapman WW. Identifying data sharing in the biomedical literature. AMIA 2008.
Acknowledgements
HP is supported by NLM training grant 5T15-LM007059-19,
WC is funded through NLM grant 1 R01LM009427-01.
Nature Precedings : doi:10.1038/npre.2008.2152.1 : Posted 4 Aug 2008

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

19 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
26% Other Professional
 
21% Researcher (at an Academic Institution)
 
16% Ph.D. Student
by Country
 
58% United States
 
11% United Kingdom
 
5% Switzerland