Abstract
The translation of patents or scientific papers is a key issue that should be helped by the use of statistical machine translation (SMT). In this paper, we propose a method to improve Chinese-Japanese patent SMT by premarking the training corpus with aligned bilingual multi-word terms. We automatically extract multi-word terms from monolingual corpora by combining statistical and linguistic filtering methods. We use the sampling-based alignment method to identify aligned terms and set some threshold on translation probabilities to select the most promising bilingual multi-word terms. We pre-mark a Chinese- Japanese training corpus with such selected aligned bilingual multi-word terms. We obtain the performance of over 70% precision in bilingual term extraction and a significant improvement of BLEU scores in our experiments on a Chinese-Japanese patent parallel corpus.
Cite
CITATION STYLE
Yang, W., Yan, J., & Lepage, Y. (2016). Extraction of bilingual technical terms for chinese-japanese patent translation. In HLT-NAACL 2016 - 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Student Research Workshop (pp. 81–87). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/n16-2012
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.