iCPE: A hybrid data selection model for SMT domain adaptation

Longyue Wang; Derek F. Wong; Lidia S. Chao; Yi Lu; Junwen Xing

Conference Proceedings

iCPE: A hybrid data selection model for SMT domain adaptation

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2013) 8202 LNAI 280-290

DOI: 10.1007/978-3-642-41491-6_26

N/ACitations

3Readers

Get full text

Abstract

Data selection is a significant technique to enhance the data-driven models especially for large-scale natural language processing (NLP). Recent research on statistical machine translation (SMT) domain adaptation focuses on the usage of various individual data selection models. In this paper, we proposed a hybrid data selection model named iCPE, which combines three state-of-the-art similarity metrics: Cosine tf-idf, Perplexity and Edit distance at both corpus level and model level. We conduct the experiments on Hong Kong Law Chinese-English corpus and the results show that this simple and effective hybrid model performs better over the baseline system trained on entire data as well as the best rival method. This consistently boosting performance of the proposed approach has a profound implication for mining very large corpora in a computationally-limited environment. © Springer-Verlag 2013.

Author supplied keywords

Cite

CITATION STYLE

APA

Wang, L., Wong, D. F., Chao, L. S., Lu, Y., & Xing, J. (2013). iCPE: A hybrid data selection model for SMT domain adaptation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8202 LNAI, pp. 280–290). https://doi.org/10.1007/978-3-642-41491-6_26

iCPE: A hybrid data selection model for SMT domain adaptation

Abstract

Author supplied keywords

Cite

Register to see more suggestions