We present the joint contribution of Instituto Superior Técnico (IST) and Unbabel to the Explainable Quality Estimation (QE) shared task, where systems were submitted to two tracks: constrained (without word-level supervision) and unconstrained (with word-level supervision). For the constrained track, we experimented with several explainability methods to extract the relevance of input tokens from sentence-level QE models built on top of multilingual pre-trained transformers. Among the different tested methods, composing explanations in the form of attention weights scaled by the norm of value vectors yielded the best results. When word-level labels are used during training, our best results were obtained by using word-level predicted probabilities. We further improve the performance of our methods on the two tracks by ensembling explanation scores extracted from models trained with different pre-trained transformers, achieving strong results for in-domain and zero-shot language pairs.
CITATION STYLE
Treviso, M. V., Guerreiro, N. M., Rei, R., & Martins, A. F. T. (2021). IST-Unbabel 2021 Submission for the Explainable Quality Estimation Shared Task. In Eval4NLP 2021 - Evaluation and Comparison of NLP Systems, Proceedings of the 2nd Workshop (pp. 133–145). Association for Computational Linguistics (ACL). https://doi.org/10.26615/978-954-452-056-4_014
Mendeley helps you to discover research relevant for your work.