Neural ParsCit: a deep learning-based reference string parser

49Citations
Citations of this article
41Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We present a deep learning approach for the core digital libraries task of parsing bibliographic reference strings. We deploy the state-of-the-art long short-term memory (LSTM) neural network architecture, a variant of a recurrent neural network to capture long-range dependencies in reference strings. We explore word embeddings and character-based word embeddings as an alternative to handcrafted features. We incrementally experiment with features, architectural configurations, and the diversity of the dataset. Our final model is an LSTM-based architecture, which layers a linear chain conditional random field (CRF) over the LSTM output. In extensive experiments in both English in-domain (computer science) and out-of-domain (humanities) test cases, as well as multilingual data, our results show a significant gain (p< 0.01) over the reported state-of-the-art CRF-only-based parser.

Cite

CITATION STYLE

APA

Prasad, A., Kaur, M., & Kan, M. Y. (2018). Neural ParsCit: a deep learning-based reference string parser. International Journal on Digital Libraries, 19(4), 323–337. https://doi.org/10.1007/s00799-018-0242-1

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free