2D Positional Embedding-based Transformer for Scene Text Recognition

  • Raisi Z
  • Naiel M
  • Fieguth P
  • et al.
N/ACitations
Citations of this article
10Readers
Mendeley users who have this article in their library.

Abstract

Recent state-of-the-art scene text recognition methods are primarily based on Recurrent Neural Networks (RNNs), however, these methods require one-dimensional (1D) features and are not designed for recognizing irregular-text instances due to the loss of spatial information present in the original two-dimensional (2D) images.  In this paper, we leverage a Transformer-based architecture for recognizing both regular and irregular text-in-the-wild images. The proposed method takes advantage of using a 2D positional encoder with the Transformer architecture to better preserve the spatial information of 2D image features than previous methods. The experiments on popular benchmarks, including the challenging COCO-Text dataset, demonstrate that the proposed scene text recognition method outperformed the state-of-the-art in most cases, especially on irregular-text recognition.

Cite

CITATION STYLE

APA

Raisi, Z., Naiel, M. A., Fieguth, P., Wardell, S., & Zelek, J. (2021). 2D Positional Embedding-based Transformer for Scene Text Recognition. Journal of Computational Vision and Imaging Systems, 6(1), 1–4. https://doi.org/10.15353/jcvis.v6i1.3533

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free