Abstract
Noise and domain are important aspects of data quality for neural machine translation. Existing research focus separately on domain-data selection, clean-data selection, or their static combination, leaving the dynamic interaction across them not explicitly examined. This paper introduces a “co-curricular learning” method to compose dynamic domain-data selection with dynamic clean-data selection, for transfer learning across both capabilities. We apply an EM-style optimization procedure to further refine the “co-curriculum”. Experiment results and analysis with two domains demonstrate the effectiveness of the method and the properties of data scheduled by the co-curriculum.
Cite
CITATION STYLE
Wang, W., Caswell, I., & Chelba, C. (2020). Dynamically composing domain-data selection with clean-data selection by “co-curricular learning” for neural machine translation. In ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (pp. 1282–1292). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/p19-1123
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.