Domain adaptation is an active field of research in statistical machine translation (SMT), but so far most work has ignored the distinction between the topic and genre of documents. In this paper we quantify and disentangle the impact of genre and topic differences on translation quality by introducing a new data set that has controlled topic and genre distributions. In addition, we perform a detailed analysis showing that differences across topics only explain to a limited degree translation performance differences across genres, and that genre-specific errors are more attributable to model coverage than to suboptimal scoring of translation candidates.
CITATION STYLE
Van Der Wees, M., Bisazza, A., Weerkamp, W., & Monz, C. (2015). What’s in a domain? Analyzing genre and topic differences in statistical machine translation. In ACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference (Vol. 2, pp. 560–566). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/p15-2092
Mendeley helps you to discover research relevant for your work.