Although Statistical Machine Translation (SMT) is now the dominant paradigm within Machine Translation, we argue that it is far from clear that it can outperform Rule-Based Machine Translation (RBMT) on small-to medium-vocabulary applications where high precision is more important than recall. A particularly important practical example is medical speech translation. We report the results of experiments where we configured the various grammars and rule-sets in an Open Source medium-vocabulary multi-lingual medical speech translation system to generate large aligned bilingual corpora for English → French and English → Japanese, which were then used to train SMT models based on the common combination of Giza++, Moses and SRILM. The resulting SMTs were unable fully to reproduce the performance of the RBMT, with performance topping out, even for English → French, with less than 70% of the SMT translations of previously unseen sentences agreeing with RBMT translations. When the outputs of the two systems differed, human judges reported the SMT result as frequently being worse than the RBMT result, and hardly ever better; moreover, the added robustness of the SMT only yielded a small improvement in recall, with a large penalty in precision.
CITATION STYLE
Rayner, M., Estrella, P., Bouillon, P., Hockey, B. A., & Nakao, Y. (2009). Using Artificially Generated Data to Evaluate Statistical Machine Translation. In ACL-IJCNLP 2009 - GEAF 2009: 2009 Workshop on Grammar Engineering Across Frameworks, Proceedings of the Workshop (pp. 54–62). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1690359.1690366
Mendeley helps you to discover research relevant for your work.