Combining CNN and Transformer as Encoder to Improve End-to-End Handwritten Mathematical Expression Recognition Accuracy

Zhang Zhang; Yibo Zhang

Conference Proceedings

Combining CNN and Transformer as Encoder to Improve End-to-End Handwritten Mathematical Expression Recognition Accuracy

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2022) 13639 LNCS 185-197

DOI: 10.1007/978-3-031-21648-0_13

2Citations

7Readers

Get full text

Abstract

The attention-based encoder-decoder (AED) models are increasingly used in handwritten mathematical expression recognition (HMER) tasks. Given the recent success of Transformer in computer vision and a variety of attempts to combine Transformer with convolutional neural network (CNN), in this paper, we study 3 ways of leveraging Transformer and CNN designs to improve AED-based HMER models: 1) Tandem way, which feeds CNN-extracted features to a Transformer encoder to capture global dependencies; 2) Parallel way, which adds a Transformer encoder branch taking raw image patches as input and concatenates its output with CNN’s as final feature; 3) Mixing way, which replaces convolution layers of CNN’s last stage with multi-head self-attention (MHSA). We compared these 3 methods on the CROHME benchmark. On CROHME 2016 and 2019, Tandem way attained the ExpRate of 54.85% and 58.56%, respectively; Parallel way attained the ExpRate of 55.63% and 57.39%; and Mixing way achieved the ExpRate of 53.93% and 55.64%. This result indicates that Parallel and Tandem ways perform better than Mixing way, and have little difference between each other.

Author supplied keywords

Cite

CITATION STYLE

APA

Zhang, Z., & Zhang, Y. (2022). Combining CNN and Transformer as Encoder to Improve End-to-End Handwritten Mathematical Expression Recognition Accuracy. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13639 LNCS, pp. 185–197). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-21648-0_13

Combining CNN and Transformer as Encoder to Improve End-to-End Handwritten Mathematical Expression Recognition Accuracy

Abstract

Author supplied keywords

Cite

Register to see more suggestions