Reconstructing the logical structure of a scientific publication using machine learning

4Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Semantic enrichment of scientific publications has an increasing impact on scholarly communication. This document describes our contribution to Semantic Publishing Challenge 2016, which aims at investigating novel approaches for improving scholarly publishing through semantic technologies. We participated in Task 2 of this challenge, which requires the extraction of information from the content of a paper given as PDF. The extracted information allows answering queries about the paper’s internal organisation and the context in which it was written. We build upon our contribution to the previous edition of the challenge, where we categorised meta-data, such as authors and affiliations, and extracted funding information. Here we use unsupervised machine learning techniques in order to extend the analysis of the logical structure of the document as to identify section titles and captions of figures and tables. Furthermore, we employ clustering techniques to create the hierarchical table of contents of the article. Our system is modular in nature and allows a separate training of different stages on different training sets.

Cite

CITATION STYLE

APA

Klampfl, S., & Kern, R. (2016). Reconstructing the logical structure of a scientific publication using machine learning. In Communications in Computer and Information Science (Vol. 641, pp. 255–268). Springer Verlag. https://doi.org/10.1007/978-3-319-46565-4_20

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free