Abstract
Adding linguistic information (syntax or semantics) to neural machine translation (NMT) has mostly focused on using point estimates from pre-trained models. Directly using the capacity of massive pre-trained contextual word embedding models such as BERT (Devlin et al., 2019) has been marginally useful in NMT because effective fine-tuning is difficult to obtain for NMT without making training brittle and unreliable. We augment NMT by extracting dense fine-tuned vector-based linguistic information from BERT instead of using point estimates. Experimental results show that our method of incorporating linguistic information helps NMT to generalize better in a variety of training contexts and is no more difficult to train than conventional Transformer-based NMT.
Cite
CITATION STYLE
Shavarani, H. S., & Sarkar, A. (2021). Better neural machine translation by extracting linguistic information from BERT. In EACL 2021 - 16th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference (pp. 2772–2783). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.eacl-main.241
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.