Improving statistical machine translation with a multilingual Paraphrase Database

19Citations
Citations of this article
104Readers
Mendeley users who have this article in their library.

Abstract

The multilingual Paraphrase Database (PPDB) is a freely available automatically created resource of paraphrases in multiple languages. In statistical machine translation, paraphrases can be used to provide translation for out-of-vocabulary (OOV) phrases. In this paper, we show that a graph propagation approach that uses PPDB paraphrases can be used to improve overall translation quality. We provide an extensive comparison with previous work and show that our PPDB-based method improves the BLEU score by up to 1.79 percent points. We show that our approach improves on the state of the art in three different settings: when faced with limited amount of parallel training data; a domain shift between training and test data; and handling a morphologically complex source language. Our PPDB-based method outperforms the use of distributional profiles from monolingual source data.

Cite

CITATION STYLE

APA

Seraj, R. M., Siahbani, M., & Sarkar, A. (2015). Improving statistical machine translation with a multilingual Paraphrase Database. In Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing (pp. 1379–1390). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/d15-1163

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free