Abstract
Mask-predict CMLM (Ghazvininejad et al., 2019) has achieved stunning performance among non-autoregressive NMT models, but we find that the mechanism of predicting all of the target words only depending on the hidden state of [MASK] is not effective and efficient in initial iterations of refinement, resulting in ungrammatical repetitions and slow convergence. In this work, we mitigate this problem by combining copied source with embeddings of [MASK] in decoder. Notably. it's not a straightforward copying that is shown to be useless, but a novel heuristic hybrid strategy - fence-mask. Experimental results show that it gains consistent boosts on both WMT14 En↔De and WMT16 En↔Ro corpus by 0.5 BLEU on average, and 1 BLEU for less-informative short sentences. This reveals that incorporating additional information by proper strategies is beneficial to improve CMLM, particularly translation quality of short texts and speeding up early-stage convergence.
Cite
CITATION STYLE
Wang, M., Guo, J., Wang, Y., Chen, Y., Su, C., Wei, D., … Yang, H. (2021). HI-CMLM: Improve CMLM with Hybrid Decoder Input. In INLG 2021 - 14th International Conference on Natural Language Generation, Proceedings (pp. 167–171). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.inlg-1.16
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.