Transferring General Multimodal Pretrained Models to Text Recognition

0Citations
Citations of this article
23Readers
Mendeley users who have this article in their library.

Abstract

This paper proposes a new method, OFA-OCR, to transfer multimodal pretrained models to text recognition. Specifically, we recast text recognition as image captioning and directly transfer a unified vision-language pretrained model to the end task. Without pretraining on large-scale annotated or synthetic text recognition data, OFA-OCR outperforms the baselines and achieves state-of-the-art performance in the Chinese text recognition benchmark. Additionally, we construct an OCR pipeline with OFA-OCR, and we demonstrate that it can achieve competitive performance with the product-level API. The code and demo are publicly available.

Cite

CITATION STYLE

APA

Lin, J., Ren, X., Zhang, Y., Liu, G., Wang, P., Yang, A., & Zhou, C. (2023). Transferring General Multimodal Pretrained Models to Text Recognition. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 588–597). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-acl.37

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free