An Evaluation of NLP Methods to Extract Mathematical Token Descriptors

N/ACitations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Mathematical formulae are a foundational component of information in all scientific and mathematical papers. Parsing meaning from these expressions by extracting textual descriptors of their variable tokens is a unique challenge that requires semantic and grammatical knowledge. In this work, we present a new manually-labeled dataset (called the MTDE dataset) of mathematical objects, the contexts in which they are defined, and their textual definitions. With this dataset, we evaluate the accuracy of several modern neural network models on two definition extraction tasks. While this is not a solved task, modern language models such as BERT perform well (∼ 90%). Both the dataset and neural network models (implemented in PyTorch jupyter notebooks) are available online to help aid future researchers in this space.

Cite

CITATION STYLE

APA

Hamel, E., Zheng, H., & Kani, N. (2022). An Evaluation of NLP Methods to Extract Mathematical Token Descriptors. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13467 LNAI, pp. 329–343). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-16681-5_23

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free