How many bits are needed to store probabilities for phrase-based translation?

19Citations
Citations of this article
82Readers
Mendeley users who have this article in their library.

Abstract

State of the art in statistical machine translation is currently represented by phrasebased models, which typically incorporate a large number of probabilities of phrase-pairs and word n-grams. In this work, we investigate data compression methods for efficiently encoding n-gram and phrase-pair probabilities, that are usually encoded in 32-bit floating point numbers. We measured the impact of compression on translation quality through a phrase-based decoder trained on two distinct tasks: the translation of European Parliament speeches from Spanish to English, and the translation of news agencies from Chinese to English. We show that with a very simple quantization scheme all probabilities can be encoded in just 4 bits with a relative loss in BLEU score on the two tasks by 1.0% and 1.6%, respectively.

References Powered by Scopus

An efficient k-means clustering algorithms: Analysis and implementation

4642Citations
N/AReaders
Get full text

The alignment template approach to statistical machine translation

665Citations
N/AReaders
Get full text

A Word-to-Phrase Statistical Translation Model

15Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Statistical machine translation

554Citations
N/AReaders
Get full text

Statistical machine translation

246Citations
N/AReaders
Get full text

Handling massive n-gram datasets efficiently

22Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Federico, M., & Bertoldi, N. (2006). How many bits are needed to store probabilities for phrase-based translation? In HLT-NAACL 2006 - Statistical Machine Translation, Proceedings of the Workshop (pp. 94–101). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1654650.1654664

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 24

62%

Researcher 9

23%

Professor / Associate Prof. 3

8%

Lecturer / Post doc 3

8%

Readers' Discipline

Tooltip

Computer Science 31

74%

Linguistics 7

17%

Engineering 2

5%

Arts and Humanities 2

5%

Save time finding and organizing research with Mendeley

Sign up for free