Collective corpus weighting and phrase scoring for SMT using graph-based random walk

2Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Data quality is one of the key factors in Statistical Machine Translation (SMT). Previous research addressed the data quality problem in SMT by corpus weighting or phrase scoring, but these two types of methods were often investigated independently. To leverage the dependencies between them, we propose an intuitive approach to improve translation modeling by collective corpus weighting and phrase scoring. The method uses the mutual reinforcement between the sentence pairs and the extracted phrase pairs, based on the observation that better sentence pairs often lead to better phrase extraction and vice versa. An effective graph-based random walk is designed to estimate the quality of sentence pairs and phrase pairs simultaneously. Extensive experimental results show that our method improves performance significantly and consistently in several Chinese-to-English translation tasks. © Springer-Verlag Berlin Heidelberg 2013.

Cite

CITATION STYLE

APA

Cui, L., Zhang, D., Liu, S., Li, M., & Zhou, M. (2013). Collective corpus weighting and phrase scoring for SMT using graph-based random walk. In Communications in Computer and Information Science (Vol. 400, pp. 176–187). Springer Verlag. https://doi.org/10.1007/978-3-642-41644-6_17

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free