Envisioning a biomedical data reuse registry.
AMIA Annual Symposium proceedings AMIA Symposium AMIA Symposium (2008)
- PubMed: 18998885
Available from
Heather Piwowar's profile on Mendeley.
or
Abstract
Repurposing research data holds many benefits for the advancement of biomedicine, yet is very difficult to measure and evaluate. We propose a data reuse registry to maintain links between primary research datasets and studies that reuse this data. Such a resource could help recognize investigators whose work is reused, illuminate aspects of reusability, and evaluate policies designed to encourage data sharing and reuse.
Available from
Heather Piwowar's profile on Mendeley.
Page 1
Envisioning a biomedical data reuse registry.
Envisioning a Data Reuse Registry
Heather A. Piwowar and Wendy W. Chapman
University of Pittsburgh Department of Biomedical Informatics
Repurposing research data holds many benefits for the advancement of
biomedicine, yet is very difficult to measure and evaluate. We propose a data
reuse registry to maintain links between primary research datasets and studies
that reuse this data. Such a resource could help recognize investigators whose
work is reused, illuminate aspects of reusability, and evaluate policies designed
to encourage data sharing and reuse.
Motivation
The full benefits of data sharing will only be realized when we can incent
investigators to share their data[1] and quantify the value created by data
reuse.[2] Current practices for recognizing the provenance of reused data include
an acknowledgment, a listing of accession numbers, a database search strategy,
and sometimes a citation within the article. These mechanisms make it very
difficult to identify and tabulate reuse, and thus to reward and encourage data
sharing. We propose a solution: a Data Reuse Registry.
What is a data reuse registry?
We define a Data Reuse Registry (DRR) as a database with links between
biomedical research studies and the datasets used within the studies. The reuse
articles may be represented as PubMed IDs, and the datasets as accession
numbers within established databases or the PubMed IDs of the studies that
originated the data.
How would the DRR be used?
Information from the DRR could be used to recognize investigators whose work
is reused, illuminate aspects of reusability, examine the variety of purposes for
which a given dataset is reused, and evaluate policies designed to encourage
data sharing and reuse.
How would the DRR be populated?
We anticipate several mechanisms for populating the DRR:
* Voluntary submissions
* Automatic detection from the literature[3]
* Prospective submission of reuse plans, followed by automatic tracking
We envision collecting prospective citations in two steps. First, prior to publication,
investigators visit a web page and list datasets and accession numbers reused in their
research, thereby creating a DRR entry record in the DRR database. In return, the reusing
investigators will be given some best-practices free-text language that they can insert into
their acknowledgments section, a list of references to the papers that originated the data,
some value-add information such as links to other studies that previously reused this data,
and a reference to a new DRR entry record. When authors cite this DRR within their reuse
study as part of their data use acknowledgement, the second step of DRR data input can
be done automatically: citations in the published literature will be mined periodically to
discover citations to DRR entries. These citations will be combined with the information
provided when the entry was created to explicitly link published papers with the datasets
they reused. The result will be searchable by anyone wishing to understand the reuse
impact made by an investigator, institution, or database.
Conclusion
While the DRR may not be a comprehensive solution, we believe it represents a starting
place for finding solutions to the important problem of evaluating, encouraging, and
rewarding data sharing and reuse.
References
1. Compete, collaborate, compel. Nat Genet. 2007;39(8).
2. Ball CA, Sherlock G, Brazma A. Funding high-throughput data sharing. Nat Biotechnol. 2004 Sep;22(9).
3. Piwowar HA, Chapman WW. Identifying data sharing in the biomedical literature. AMIA 2008.
Acknowledgements
HP is supported by NLM training grant 5T15-LM007059-19,
WC is funded through NLM grant 1 R01LM009427-01.
Nature Precedings : doi:10.1038/npre.2008.2152.1 : Posted 4 Aug 2008
Heather A. Piwowar and Wendy W. Chapman
University of Pittsburgh Department of Biomedical Informatics
Repurposing research data holds many benefits for the advancement of
biomedicine, yet is very difficult to measure and evaluate. We propose a data
reuse registry to maintain links between primary research datasets and studies
that reuse this data. Such a resource could help recognize investigators whose
work is reused, illuminate aspects of reusability, and evaluate policies designed
to encourage data sharing and reuse.
Motivation
The full benefits of data sharing will only be realized when we can incent
investigators to share their data[1] and quantify the value created by data
reuse.[2] Current practices for recognizing the provenance of reused data include
an acknowledgment, a listing of accession numbers, a database search strategy,
and sometimes a citation within the article. These mechanisms make it very
difficult to identify and tabulate reuse, and thus to reward and encourage data
sharing. We propose a solution: a Data Reuse Registry.
What is a data reuse registry?
We define a Data Reuse Registry (DRR) as a database with links between
biomedical research studies and the datasets used within the studies. The reuse
articles may be represented as PubMed IDs, and the datasets as accession
numbers within established databases or the PubMed IDs of the studies that
originated the data.
How would the DRR be used?
Information from the DRR could be used to recognize investigators whose work
is reused, illuminate aspects of reusability, examine the variety of purposes for
which a given dataset is reused, and evaluate policies designed to encourage
data sharing and reuse.
How would the DRR be populated?
We anticipate several mechanisms for populating the DRR:
* Voluntary submissions
* Automatic detection from the literature[3]
* Prospective submission of reuse plans, followed by automatic tracking
We envision collecting prospective citations in two steps. First, prior to publication,
investigators visit a web page and list datasets and accession numbers reused in their
research, thereby creating a DRR entry record in the DRR database. In return, the reusing
investigators will be given some best-practices free-text language that they can insert into
their acknowledgments section, a list of references to the papers that originated the data,
some value-add information such as links to other studies that previously reused this data,
and a reference to a new DRR entry record. When authors cite this DRR within their reuse
study as part of their data use acknowledgement, the second step of DRR data input can
be done automatically: citations in the published literature will be mined periodically to
discover citations to DRR entries. These citations will be combined with the information
provided when the entry was created to explicitly link published papers with the datasets
they reused. The result will be searchable by anyone wishing to understand the reuse
impact made by an investigator, institution, or database.
Conclusion
While the DRR may not be a comprehensive solution, we believe it represents a starting
place for finding solutions to the important problem of evaluating, encouraging, and
rewarding data sharing and reuse.
References
1. Compete, collaborate, compel. Nat Genet. 2007;39(8).
2. Ball CA, Sherlock G, Brazma A. Funding high-throughput data sharing. Nat Biotechnol. 2004 Sep;22(9).
3. Piwowar HA, Chapman WW. Identifying data sharing in the biomedical literature. AMIA 2008.
Acknowledgements
HP is supported by NLM training grant 5T15-LM007059-19,
WC is funded through NLM grant 1 R01LM009427-01.
Nature Precedings : doi:10.1038/npre.2008.2152.1 : Posted 4 Aug 2008
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime
Start using Mendeley in seconds!
Readership Statistics
19 Readers on Mendeley
by Discipline
11% Social Sciences
by Academic Status
26% Other Professional
21% Researcher (at an Academic Institution)
16% Ph.D. Student
by Country
58% United States
11% United Kingdom
5% Switzerland



