Data Augmentation Techniques for Machine Translation of Code-Switched Texts: A Comparative Study

1Citations
Citations of this article
14Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Code-switching (CSW) text generation has been receiving increasing attention as a solution to address data scarcity. In light of this growing interest, we need more comprehensive studies comparing different augmentation approaches. In this work, we compare three popular approaches: lexical replacements, linguistic theories, and back-translation (BT), in the context of Egyptian Arabic-English CSW. We assess the effectiveness of the approaches on machine translation and the quality of augmentations through human evaluation. We show that BT and CSW predictive-based lexical replacement, being trained on CSW parallel data, perform best on both tasks. Linguistic theories and random lexical replacement prove to be effective in the lack of CSW parallel data, where both approaches achieve similar results.

Cite

CITATION STYLE

APA

Hamed, I., Habash, N., & Vu, N. T. (2023). Data Augmentation Techniques for Machine Translation of Code-Switched Texts: A Comparative Study. In Findings of the Association for Computational Linguistics: EMNLP 2023 (pp. 140–154). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-emnlp.11

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free