Improving statistical machine translation with a multilingual Paraphrase Database

19Citations
Citations of this article
108Readers
Mendeley users who have this article in their library.

Abstract

The multilingual Paraphrase Database (PPDB) is a freely available automatically created resource of paraphrases in multiple languages. In statistical machine translation, paraphrases can be used to provide translation for out-of-vocabulary (OOV) phrases. In this paper, we show that a graph propagation approach that uses PPDB paraphrases can be used to improve overall translation quality. We provide an extensive comparison with previous work and show that our PPDB-based method improves the BLEU score by up to 1.79 percent points. We show that our approach improves on the state of the art in three different settings: when faced with limited amount of parallel training data; a domain shift between training and test data; and handling a morphologically complex source language. Our PPDB-based method outperforms the use of distributional profiles from monolingual source data.

References Powered by Scopus

A systematic comparison of various statistical alignment models

2939Citations
N/AReaders
Get full text

Automatic retrieval and clustering of similar words

1006Citations
N/AReaders
Get full text

Paraphrasing with bilingual parallel corpora

420Citations
N/AReaders
Get full text

Cited by Powered by Scopus

A continuously growing dataset of sentential paraphrases

115Citations
N/AReaders
Get full text

Paraphrase Generation: A Survey of the State of the Art

62Citations
N/AReaders
Get full text

Counter-Interference Adapter for Multilingual Machine Translation

34Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Seraj, R. M., Siahbani, M., & Sarkar, A. (2015). Improving statistical machine translation with a multilingual Paraphrase Database. In Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing (pp. 1379–1390). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/d15-1163

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 34

62%

Researcher 11

20%

Lecturer / Post doc 7

13%

Professor / Associate Prof. 3

5%

Readers' Discipline

Tooltip

Computer Science 47

76%

Linguistics 8

13%

Business, Management and Accounting 4

6%

Social Sciences 3

5%

Save time finding and organizing research with Mendeley

Sign up for free