What's in a domain? Analyzing genre and topic differences in statistical machine translation

Marlies Van Der Wees; Arianna Bisazza; Wouter Weerkamp; Christof Monz

Conference ProceedingsOPEN ACCESS

What's in a domain? Analyzing genre and topic differences in statistical machine translation

ACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference (2015) 2 560-566

DOI: 10.3115/v1/p15-2092

22Citations

110Readers

Abstract

Domain adaptation is an active field of research in statistical machine translation (SMT), but so far most work has ignored the distinction between the topic and genre of documents. In this paper we quantify and disentangle the impact of genre and topic differences on translation quality by introducing a new data set that has controlled topic and genre distributions. In addition, we perform a detailed analysis showing that differences across topics only explain to a limited degree translation performance differences across genres, and that genre-specific errors are more attributable to model coverage than to suboptimal scoring of translation candidates.

Cite

CITATION STYLE

APA

Van Der Wees, M., Bisazza, A., Weerkamp, W., & Monz, C. (2015). What’s in a domain? Analyzing genre and topic differences in statistical machine translation. In ACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference (Vol. 2, pp. 560–566). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/p15-2092

What's in a domain? Analyzing genre and topic differences in statistical machine translation

Abstract

Cite

Register to see more suggestions