Synergy with Translation Artifacts for Training and Inference in Multilingual Tasks

6Citations
Citations of this article
18Readers
Mendeley users who have this article in their library.

Abstract

Translation has played a crucial role in improving the performance on multilingual tasks: (1) to generate the target language data from the source language data for training and (2) to generate the source language data from the target language data for inference. However, prior works have not considered the use of both translations simultaneously. This paper shows that combining them can synergize the results on various multilingual sentence classification tasks. We empirically find that translation artifacts stylized by translators are the main factor of the performance gain. Based on this analysis, we adopt two training methods, SupCon and MixUp, considering translation artifacts. Furthermore, we propose a cross-lingual fine-tuning algorithm called MUSC, which uses SupCon and MixUp jointly and improves the performance. Our code is available at https://github.com/jongwooko/MUSC.

References Powered by Scopus

Improving neural machine translation models with monolingual data

1634Citations
978Readers

MixText: Linguistically-informed interpolation of hidden space for semi-supervised text classification

254Citations
351Readers

Deflecting Adversarial Attacks with Pixel Deflection

241Citations
302Readers
Get full text

Cited by Powered by Scopus

To Translate or Not to Translate: A Systematic Investigation of Translation-Based Cross-Lingual Transfer to Low-Resource Languages

0Citations
8Readers
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Oh, J., Ko, J., & Yun, S. Y. (2022). Synergy with Translation Artifacts for Training and Inference in Multilingual Tasks. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 (pp. 6747–6754). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.emnlp-main.452

Readers over time

‘23‘24‘25036912

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 3

50%

Researcher 2

33%

Lecturer / Post doc 1

17%

Readers' Discipline

Tooltip

Computer Science 7

70%

Medicine and Dentistry 1

10%

Linguistics 1

10%

Neuroscience 1

10%

Save time finding and organizing research with Mendeley

Sign up for free
0