Speeding up neural machine translation decoding by shrinking run-time vocabulary

Xing Shi; Kevin Knight

Conference ProceedingsOPEN ACCESS

Speeding up neural machine translation decoding by shrinking run-time vocabulary

ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers) (2017) 2 574-579

DOI: 10.18653/v1/P17-2091

17Citations

108Readers

Abstract

We speed up Neural Machine Translation (NMT) decoding by shrinking run-time target vocabulary. We experiment with two shrinking approaches: Locality Sensitive Hashing (LSH) and word alignments. Using the latter method, we get a 2x overall speed-up over a highly-optimized GPU implementation, without hurting BLEU. On certain low-resource language pairs, the same methods improve BLEU by 0.5 points. We also report a negative result for LSH on GPUs, due to relatively large overhead, though it was successful on CPUs. Compared with Locality Sensitive Hashing (LSH), decoding with word alignments is GPU-friendly, orthogonal to existing speedup methods and more robust across language pairs.

Cite

CITATION STYLE

APA

Shi, X., & Knight, K. (2017). Speeding up neural machine translation decoding by shrinking run-time vocabulary. In ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers) (Vol. 2, pp. 574–579). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/P17-2091

Speeding up neural machine translation decoding by shrinking run-time vocabulary

Abstract

Cite

Register to see more suggestions