A linear grammar approach to mathematical formula recognition from PDF

24Citations
Citations of this article
17Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Many approaches have been proposed over the years for the recognition of mathematical formulae from scanned documents. More recently a need has arisen to recognise formulae from PDF documents. Here we can avoid ambiguities introduced by traditional OCR approaches and instead extract perfect knowledge of the characters used in formulae directly from the document. This can be exploited by formula recognition techniques to achieve correct results and high performance. In this paper we revisit an old grammatical approach to formula recognition, that of Anderson from 1968, and assess its applicability with respect to data extracted from PDF documents. We identify some problems of the original method when applied to common mathematical expressions and show how they can be overcome. The simplicity of the original method leads to a very efficient recognition technique that not only is very simple to implement but also yields results of high accuracy for the recognition of mathematical formulae from PDF documents. © 2009 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Baker, J. B., Sexton, A. P., & Sorge, V. (2009). A linear grammar approach to mathematical formula recognition from PDF. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5625 LNAI, pp. 201–216). https://doi.org/10.1007/978-3-642-02614-0_19

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free