Encoder-decoder models typically only employ words that are frequently used in the training corpus to reduce the compu- tational costs and exclude noise. How- ever, this vocabulary set may still in- clude words that interfere with learning in encoder-decoder models. This paper pro- poses a method for selecting more suit- able words for learning encoders by uti- lizing not only frequency but also co- occurrence information, which we capture using the HITS algorithm. We apply our proposed method to two tasks: Machine translation and grammatical error correc- tion. For Japanese-to-English translation, this method achieves a BLEU score that is 0.56 points more than that of a baseline. Furthermore, it outperforms the baseline method for English grammatical error cor- rection, with an F0:5-measure that is 1.48 points higher.
CITATION STYLE
Katsumata, S., Matsumura, Y., Yamagishi, H., & Komachi, M. (2018). Graph-based filtering of out-of-vocabularywords for encoder-decoder models. In ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Student Research Workshop (pp. 112–119). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/p18-3016
Mendeley helps you to discover research relevant for your work.