Graph-based filtering of out-of-vocabularywords for encoder-decoder models

1Citations
Citations of this article
71Readers
Mendeley users who have this article in their library.

Abstract

Encoder-decoder models typically only employ words that are frequently used in the training corpus to reduce the compu- tational costs and exclude noise. How- ever, this vocabulary set may still in- clude words that interfere with learning in encoder-decoder models. This paper pro- poses a method for selecting more suit- able words for learning encoders by uti- lizing not only frequency but also co- occurrence information, which we capture using the HITS algorithm. We apply our proposed method to two tasks: Machine translation and grammatical error correc- tion. For Japanese-to-English translation, this method achieves a BLEU score that is 0.56 points more than that of a baseline. Furthermore, it outperforms the baseline method for English grammatical error cor- rection, with an F0:5-measure that is 1.48 points higher.

References Powered by Scopus

Learning phrase representations using RNN encoder-decoder for statistical machine translation

11829Citations
N/AReaders
Get full text

Authoritative sources in a hyperlinked environment

6126Citations
N/AReaders
Get full text

Neural machine translation of rare words with subword units

4518Citations
N/AReaders
Get full text

Cited by Powered by Scopus

A Diachronic Assessment of Research on Machine Translation Methodology

0Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Katsumata, S., Matsumura, Y., Yamagishi, H., & Komachi, M. (2018). Graph-based filtering of out-of-vocabularywords for encoder-decoder models. In ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Student Research Workshop (pp. 112–119). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/p18-3016

Readers over time

‘18‘19‘20‘21‘22‘23‘2406121824

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 20

65%

Researcher 8

26%

Lecturer / Post doc 2

6%

Professor / Associate Prof. 1

3%

Readers' Discipline

Tooltip

Computer Science 24

67%

Linguistics 8

22%

Engineering 2

6%

Business, Management and Accounting 2

6%

Save time finding and organizing research with Mendeley

Sign up for free
0