A modified burrows-wheeler transform for highly scalable example-based translation

7Citations
Citations of this article
38Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The Burrows-Wheeler Transform (BWT) was originally developed for data compression, but can also be applied to indexing text. In this paper, an adaptation of the BWT to word-based indexing of the training corpus for an example-based machine translation (EBMT) system is presented. The adapted BWT embeds the necessary information to retrieve matched training instances without requiring any additional space and can be instantiated in a compressed form which reduces disk space and memory requirements by about 40% while still remaining searchable without decompression. Both the speed advantage from O(log N) lookups compared to the O(N) lookups in the inverted-file index which had previously been used and the structure of the index itself act as enablers for additional capabilities and run-time speed. Because the BWT groups all instances of any n-gram together, it can be used to quickly enumerate the most-frequent n-grams, for which translations can be precomputed and stored, resulting in an order-of-magnitude speedup at run time. © Springer-Verlag 2004.

Cite

CITATION STYLE

APA

Brown, R. D. (2004). A modified burrows-wheeler transform for highly scalable example-based translation. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3265, 27–36. https://doi.org/10.1007/978-3-540-30194-3_4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free