Abstract
Science across all disciplines has become increasingly data-driven, leading to additional needs with respect to software for collecting, processing and analysingdata. Thus, transparency about software used as part of the scientific process iscrucial to understand provenance of individual research data and insights, is aprerequisite for reproducibility and can enable macro-analysis of the evolution ofscientific methods over time. However, missing rigor in software citation practicesrenders the automated detection and disambiguation of software mentions achallenging problem. In this work, we provide a large-scale analysis of software usageand citation practices facilitated through an unprecedented knowledge graph ofsoftware mentions and affiliated metadata generated through supervised informationextraction models trained on a unique gold standard corpus and applied to more than3 million scientific articles. Our information extraction approach distinguishesdifferent types of software and mentions, disambiguates mentions and outperformsthe state-of-the-art significantly, leading to the most comprehensive corpus of 11.8Msoftware mentions that are described through a knowledge graph consisting of morethan 300 M triples. Our analysis provides insights into the evolution of softwareusage and citation patterns across various fields, ranks of journals, and impact ofpublications. Whereas, to the best of our knowledge, this is the most comprehensiveanalysis of software use and citation at the time, all data and models are sharedpublicly to facilitate further research into scientific use and citation of software
Author supplied keywords
Cite
CITATION STYLE
Schindler, D., Bensmann, F., Dietze, S., & Krüger, F. (2022). The role of software in science: a knowledge graph-based analysis of software mentions in PubMed Central. PeerJ Computer Science, 8. https://doi.org/10.7717/PEERJ-CS.835
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.