In syntax-based machine translation, it is known that the accuracy of parsing greatly affects the translation accuracy. Self-training, which uses parser output as training data, is one method to improve the parser accuracy. However, because parsing errors cause noisy data to be mixed with the training data, automatically generated parse trees do not always con-tribute to improving accuracy. In this paper, we propose a method for selecting self-training data by performing syntax-based machine translation using a variety of parse trees, us-ing automatic evaluation metrics to select which translation is better, and using that translation's parse tree for parser self-training. This method allows us to automatically choose the trees that contribute to improving translation accuracy, im-proving the effectiveness of self-training. In experiments, we found that our self-trained parsers significantly improve a state-of-the-art syntax-based machine translation system in two language pairs.
CITATION STYLE
Morishita, M., Akabe, K., Hatakoshi, Y., Neubig, G., Yoshino, K., & Nakamura, S. (2016). Parser Self-Training for Syntax-Based Machine Translation. Journal of Natural Language Processing, 23(4), 353–376. https://doi.org/10.5715/jnlp.23.353
Mendeley helps you to discover research relevant for your work.