Improving statistical machine translation with a multilingual Paraphrase Database

Ramtin Mehdizadeh Seraj; Maryam Siahbani; Anoop Sarkar

Conference ProceedingsOPEN ACCESS

Improving statistical machine translation with a multilingual Paraphrase Database

Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing (2015) 1379-1390

DOI: 10.18653/v1/d15-1163

19Citations

108Readers

Abstract

The multilingual Paraphrase Database (PPDB) is a freely available automatically created resource of paraphrases in multiple languages. In statistical machine translation, paraphrases can be used to provide translation for out-of-vocabulary (OOV) phrases. In this paper, we show that a graph propagation approach that uses PPDB paraphrases can be used to improve overall translation quality. We provide an extensive comparison with previous work and show that our PPDB-based method improves the BLEU score by up to 1.79 percent points. We show that our approach improves on the state of the art in three different settings: when faced with limited amount of parallel training data; a domain shift between training and test data; and handling a morphologically complex source language. Our PPDB-based method outperforms the use of distributional profiles from monolingual source data.

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Seraj, R. M., Siahbani, M., & Sarkar, A. (2015). Improving statistical machine translation with a multilingual Paraphrase Database. In Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing (pp. 1379–1390). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/d15-1163

Readers' Seniority

PhD / Post grad / Masters / Doc 34

62%

Researcher 11

20%

Lecturer / Post doc 7

13%

Professor / Associate Prof. 3

Readers' Discipline

Computer Science 47

76%

Linguistics 8

13%

Business, Management and Accounting 4

Social Sciences 3

Improving statistical machine translation with a multilingual Paraphrase Database

Abstract

References Powered by Scopus

A systematic comparison of various statistical alignment models

Automatic retrieval and clustering of similar words

Paraphrasing with bilingual parallel corpora

Cited by Powered by Scopus

A continuously growing dataset of sentential paraphrases

Paraphrase Generation: A Survey of the State of the Art

Counter-Interference Adapter for Multilingual Machine Translation

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline