Abstract
Quality Estimation (QE) for Machine Translation has been shown to reach relatively high accuracy in predicting sentence-level scores, relying on pretrained contextual embeddings and human-produced quality scores. However, the lack of explanations along with decisions made by end-to-end neural models makes the results difficult to interpret. Furthermore, word-level annotated datasets are rare due to the prohibitive effort required to perform this task, while they could provide interpretable signals in addition to sentence-level QE outputs. In this paper, we propose a novel QE architecture which tackles both the word-level data scarcity and the interpretability limitations of recent approaches. Sentence-level and word-level components are jointly pretrained through an attention mechanism based on synthetic data and a set of MT metrics embedded in a common space. Our approach is evaluated on the Eval4NLP 2021 shared task and our submissions reach the first position in all language pairs. The extraction of metric-to-input attention weights show that different metrics focus on different parts of the source and target text, providing strong rationales in the decision-making process of the QE model.
Cite
CITATION STYLE
Rubino, R., Fujita, A., & Marie, B. (2021). Error Identification for Machine Translation with Metric Embedding and Attention. In Eval4NLP 2021 - Evaluation and Comparison of NLP Systems, Proceedings of the 2nd Workshop (pp. 146–156). Association for Computational Linguistics (ACL). https://doi.org/10.26615/978-954-452-056-4_015
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.