Collective corpus weighting and phrase scoring for SMT using graph-based random walk

Lei Cui; Dongdong Zhang; Shujie Liu; Mu Li; Ming Zhou

Conference Proceedings

Collective corpus weighting and phrase scoring for SMT using graph-based random walk

Communications in Computer and Information Science (2013) 400 176-187

DOI: 10.1007/978-3-642-41644-6_17

2Citations

3Readers

Get full text

Abstract

Data quality is one of the key factors in Statistical Machine Translation (SMT). Previous research addressed the data quality problem in SMT by corpus weighting or phrase scoring, but these two types of methods were often investigated independently. To leverage the dependencies between them, we propose an intuitive approach to improve translation modeling by collective corpus weighting and phrase scoring. The method uses the mutual reinforcement between the sentence pairs and the extracted phrase pairs, based on the observation that better sentence pairs often lead to better phrase extraction and vice versa. An effective graph-based random walk is designed to estimate the quality of sentence pairs and phrase pairs simultaneously. Extensive experimental results show that our method improves performance significantly and consistently in several Chinese-to-English translation tasks. © Springer-Verlag Berlin Heidelberg 2013.

Author supplied keywords

Cite

CITATION STYLE

APA

Cui, L., Zhang, D., Liu, S., Li, M., & Zhou, M. (2013). Collective corpus weighting and phrase scoring for SMT using graph-based random walk. In Communications in Computer and Information Science (Vol. 400, pp. 176–187). Springer Verlag. https://doi.org/10.1007/978-3-642-41644-6_17

Collective corpus weighting and phrase scoring for SMT using graph-based random walk

Abstract

Author supplied keywords

Cite

Register to see more suggestions