Cascade network with deformable composite backbone for formula detection in scanned document images

Khurram Azeem Hashmi; Alain Pagani; Marcus Liwicki; Didier Stricker; Muhammad Zeshan Afzal

Journal ArticleOPEN ACCESS

Cascade network with deformable composite backbone for formula detection in scanned document images

Applied Sciences (Switzerland) (2021) 11(16)

DOI: 10.3390/app11167610

15Citations

12Readers

Get full text

Abstract

This paper presents a novel architecture for detecting mathematical formulas in document images, which is an important step for reliable information extraction in several domains. Recently, Cascade Mask R-CNN networks have been introduced to solve object detection in computer vision. In this paper, we suggest a couple of modifications to the existing Cascade Mask R-CNN architecture: First, the proposed network uses deformable convolutions instead of conventional convolutions in the backbone network to spot areas of interest better. Second, it uses a dual backbone of ResNeXt-101, having composite connections at the parallel stages. Finally, our proposed network is end-to-end trainable. We evaluate the proposed approach on the ICDAR-2017 POD and Marmot datasets. The proposed approach demonstrates state-of-the-art performance on ICDAR-2017 POD at a higher IoU threshold with an f1-score of 0.917, reducing the relative error by 7.8%. Moreover, we accomplished correct detection accuracy of 81.3% on embedded formulas on the Marmot dataset, which results in a relative error reduction of 30%.

Author supplied keywords

Cite

CITATION STYLE

APA

Hashmi, K. A., Pagani, A., Liwicki, M., Stricker, D., & Afzal, M. Z. (2021). Cascade network with deformable composite backbone for formula detection in scanned document images. Applied Sciences (Switzerland), 11(16). https://doi.org/10.3390/app11167610

Cascade network with deformable composite backbone for formula detection in scanned document images

Abstract

Author supplied keywords

Cite

Register to see more suggestions