Structured References from PDF Articles: Assessing the Tools for Bibliographic Reference Extraction and Parsing

0Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Many solutions have been provided to extract bibliographic references from PDF papers. Machine learning, rule-based and regular expressions approaches were among the most used methods adopted in tools for addressing this task. This work aims to identify and evaluate all and only the tools which, given a full-text paper in PDF format, can recognise, extract and parse bibliographic references. We identified seven tools: Anystyle, Cermine, ExCite, Grobid, Pdfssa4met, Scholarcy and Science Parse. We compared and evaluated them against a corpus of 56 PDF articles published in 27 subject areas. Indeed, Anystyle obtained the best overall score, followed by Cermine. However, in some subject areas, other tools had better results for specific tasks.

Cite

CITATION STYLE

APA

Cioffi, A., & Peroni, S. (2022). Structured References from PDF Articles: Assessing the Tools for Bibliographic Reference Extraction and Parsing. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13541 LNCS, pp. 425–432). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-16802-4_42

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free