Graph-based filtering of out-of-vocabularywords for encoder-decoder models

1Citations
Citations of this article
71Readers
Mendeley users who have this article in their library.

Abstract

Encoder-decoder models typically only employ words that are frequently used in the training corpus to reduce the compu- tational costs and exclude noise. How- ever, this vocabulary set may still in- clude words that interfere with learning in encoder-decoder models. This paper pro- poses a method for selecting more suit- able words for learning encoders by uti- lizing not only frequency but also co- occurrence information, which we capture using the HITS algorithm. We apply our proposed method to two tasks: Machine translation and grammatical error correc- tion. For Japanese-to-English translation, this method achieves a BLEU score that is 0.56 points more than that of a baseline. Furthermore, it outperforms the baseline method for English grammatical error cor- rection, with an F0:5-measure that is 1.48 points higher.

Cite

CITATION STYLE

APA

Katsumata, S., Matsumura, Y., Yamagishi, H., & Komachi, M. (2018). Graph-based filtering of out-of-vocabularywords for encoder-decoder models. In ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Student Research Workshop (pp. 112–119). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/p18-3016

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free