Reliance on science by inventors: Hybrid extraction of in-text patent-to-article citations

25Citations
Citations of this article
44Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

We curate and characterize a complete set of citations from patents to scientific articles, including 16.8 million from the full text of USPTO and EPO patents. Combining hand-tuned heuristics and the GROBID machine-learning package, we achieve much higher performance than machine learning alone. Recall is evaluated with a set of 5939 randomly sampled, cross-verified “known good” citations, which the authors have never seen. At 99.4% precision, we achieve recall rates of 78% for the full test set and 88% for references specified without mistakes. We compare these “in-text” citations with those on the front page of patents. In-text citations are more diverse temporally, geographically, and topically; moreover, they are less self-referential and less likely to be copied from one patent to the next. In-text citations have dropped from two-thirds of all patent-to-article citations half a century ago to about one-third today. In replicating two articles that use only front-page citations, we show that failing to capture in-text citations leads to understating the role of academic science in commercial invention. All patent-to-article citations, the known-good test set, and the source code are available at http://relianceonscience.org.

Cite

CITATION STYLE

APA

Marx, M., & Fuegi, A. (2022). Reliance on science by inventors: Hybrid extraction of in-text patent-to-article citations. Journal of Economics and Management Strategy, 31(2), 369–392. https://doi.org/10.1111/jems.12455

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free