CHIA: CHoosing Instances to Annotate for Machine Translation

5Citations
Citations of this article
18Readers
Mendeley users who have this article in their library.

Abstract

Neural machine translation (MT) systems have been shown to perform poorly on low-resource language pairs, for which large-scale parallel data is unavailable. Making the data annotation process faster and cheaper is therefore important to ensure equitable access to MT systems. To make optimal use of a limited annotation budget, we present CHIA (choosing instances to annotate), a method for selecting instances to annotate for machine translation. Using an existing multi-way parallel dataset of high-resource languages, we first identify instances, based on model training dynamics, that are most informative for training MT models for high-resource languages. We find that there are cross-lingual commonalities in instances that are useful for MT model training, which we use to identify instances that will be useful to train models on a new target language. Evaluating on 20 languages from two corpora, we show that training on instances selected using our method provides an average performance improvement of 1.59 BLEU over training on randomly selected instances of the same size.

References Powered by Scopus

A Call for Clarity in Reporting BLEU Scores

2018Citations
N/AReaders
Get full text

A sequential algorithm for training text classifiers

1918Citations
N/AReaders
Get full text

Improving neural machine translation models with monolingual data

1651Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Better than Random: Reliable NLG Human Evaluation with Constrained Active Sampling

2Citations
N/AReaders
Get full text

Harnessing Dataset Cartography for Improved Compositional Generalization in Transformers

0Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Bhatnagar, R., Ganesh, A., & Kann, K. (2022). CHIA: CHoosing Instances to Annotate for Machine Translation. In Findings of the Association for Computational Linguistics: EMNLP 2022 (pp. 7328–7344). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.findings-emnlp.540

Readers over time

‘23‘24‘25036912

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 4

57%

Researcher 2

29%

Lecturer / Post doc 1

14%

Readers' Discipline

Tooltip

Computer Science 8

73%

Medicine and Dentistry 1

9%

Linguistics 1

9%

Neuroscience 1

9%

Save time finding and organizing research with Mendeley

Sign up for free
0