Error Identification for Machine Translation with Metric Embedding and Attention

8Citations
Citations of this article
31Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Quality Estimation (QE) for Machine Translation has been shown to reach relatively high accuracy in predicting sentence-level scores, relying on pretrained contextual embeddings and human-produced quality scores. However, the lack of explanations along with decisions made by end-to-end neural models makes the results difficult to interpret. Furthermore, word-level annotated datasets are rare due to the prohibitive effort required to perform this task, while they could provide interpretable signals in addition to sentence-level QE outputs. In this paper, we propose a novel QE architecture which tackles both the word-level data scarcity and the interpretability limitations of recent approaches. Sentence-level and word-level components are jointly pretrained through an attention mechanism based on synthetic data and a set of MT metrics embedded in a common space. Our approach is evaluated on the Eval4NLP 2021 shared task and our submissions reach the first position in all language pairs. The extraction of metric-to-input attention weights show that different metrics focus on different parts of the source and target text, providing strong rationales in the decision-making process of the QE model.

Cite

CITATION STYLE

APA

Rubino, R., Fujita, A., & Marie, B. (2021). Error Identification for Machine Translation with Metric Embedding and Attention. In Eval4NLP 2021 - Evaluation and Comparison of NLP Systems, Proceedings of the 2nd Workshop (pp. 146–156). Association for Computational Linguistics (ACL). https://doi.org/10.26615/978-954-452-056-4_015

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free