Scholarly articles in mathematical fields often feature mathematical statements (theorems, propositions, etc.) and their proofs. In this paper, we present preliminary work for extracting such information from PDF documents, with several types of approaches: vision (using YOLO), natural language (with transformers), and styling information (with linear conditional random fields). Our main task is to identify which parts of the paper to label as theorem-like environments and proofs. We rely on a dataset collected from arXiv, with LATeX sources of research articles used to train the models.
CITATION STYLE
Mishra, S., Pluvinage, L., & Senellart, P. (2021). Towards extraction of theorems and proofs in scholarly articles. In DocEng 2021 - Proceedings of the 2021 ACM Symposium on Document Engineering. Association for Computing Machinery, Inc. https://doi.org/10.1145/3469096.3475059
Mendeley helps you to discover research relevant for your work.