In-Image Neural Machine Translation with Segmented Pixel Sequence-to-Sequence Model

8Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.

Abstract

In-Image Machine Translation (IIMT) aims to convert images containing texts from one language to another. Traditional approaches for this task are cascade methods, which utilize optical character recognition (OCR) followed by neural machine translation (NMT) and text rendering. However, the cascade methods suffer from compounding errors of OCR and NMT, leading to a decrease in translation quality. In this paper, we propose an end-to-end model instead of the OCR, NMT and text rendering pipeline. Our neural architecture adopts an encoder-decoder paradigm with segmented pixel sequences as inputs and outputs. Through end-to-end training, our model yields improvements across various dimensions, (i) it achieves higher translation quality by avoiding error propagation, (ii) it demonstrates robustness for out domain data, and (iii) it displays insensitivity to incomplete words. To validate the effectiveness of our method and support for future research, we construct our dataset containing 4M pairs of De-En images and train our end-to-end model. The experimental results show that our approach outperforms both cascade method and current end-to-end model.

References Powered by Scopus

Deep residual learning for image recognition

174329Citations
N/AReaders
Get full text

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

16590Citations
N/AReaders
Get full text

High-Resolution Image Synthesis with Latent Diffusion Models

6812Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Translatotron-V(ison): An End-to-End Model for In-Image Machine Translation

4Citations
N/AReaders
Get full text

Understand Layout and Translate Text: Unified Feature-Conductive End-to-End Document Image Translation

0Citations
N/AReaders
Get full text

Deterministic Reversible Data Augmentation for Neural Machine Translation

0Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Tian, Y., Li, X., Liu, Z., Guo, Y., & Wang, B. (2023). In-Image Neural Machine Translation with Segmented Pixel Sequence-to-Sequence Model. In Findings of the Association for Computational Linguistics: EMNLP 2023 (pp. 15046–15057). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-emnlp.1004

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 2

67%

Lecturer / Post doc 1

33%

Readers' Discipline

Tooltip

Computer Science 4

67%

Medicine and Dentistry 1

17%

Engineering 1

17%

Save time finding and organizing research with Mendeley

Sign up for free