The problem of unknown words in neural machine translation (NMT), which not only affects the semantic integrity of the source sentences but also adversely affects the generating of the target sentences. The traditional methods usually replace the unknown words according to the similarity of word vectors, these approaches are difficult to deal with rare words and polysemous words. Therefore, this paper proposes a new method of unknown words processing in NMT based on the semantic concept of the source language. Firstly, we use the semantic concept of source language semantic dictionary to find the candidate in-vocabulary words. Secondly, we propose a method to calculate the semantic similarity by integrating the source language model and the semantic concept network, to obtain the best replacement word. Experiments on English to Chinese translation task demonstrate that our proposed method can achieve more than 2.6 BLEU points over the conventional NMT method. Compared with the traditional method based on word vector similarity, our method can also obtain an improvement by nearly 0.8 BLEU points.
CITATION STYLE
Li, S., Xu, J., Miao, G., Zhang, Y., & Chen, Y. (2018). A semantic concept based unknown words processing method in neural machine translation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10619 LNAI, pp. 233–242). Springer Verlag. https://doi.org/10.1007/978-3-319-73618-1_20
Mendeley helps you to discover research relevant for your work.