Abstract
As larger and more diverse parallel texts become available, how can we leverage heterogeneous data to train robust machine translation systems that achieve good translation quality on various test domains? This challenge has been addressed so far by repurposing techniques developed for domain adaptation, such as linear mixture models which combine estimates learned on homogeneous subdomains. However, learning from large heterogeneous corpora is quite different from standard adaptation tasks with clear domain distinctions. In this paper, we show that linear mixture models can reliably improve translation quality in very heterogeneous training conditions, even if the mixtures do not use any domain knowledge and attempt to learn generic models rather than adapt them to the target domain. This surprising finding opens new perspectives for using mixture models in machine translation beyond clear cut domain adaptation tasks.
Cite
CITATION STYLE
Carpuat, M., Goutte, C., & Foster, G. (2014). Linear mixture models for robust machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 499–509). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/w14-3363
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.