Statistical Machine Translation (SMT) delivers a convenient format for representing how translation process is modeled. The translations of words or phrases are generally computed based on their occurrence in some bilingual training corpus. However, SMT still suffers for out of vocabulary (OOV) words and less frequent words especially when only limited training data are available or training and test data are in different domains. In this paper, we propose a convenient way to handle OOV and rare words using paraphrasing technique. Initially we extract paraphrases from bilingual training corpus with the help of comparable corpora. The extracted paraphrases are analyzed by conditionally checking the association of their monolingual distribution. Bilingual aligned paraphrases are incorporated as additional training data into the PB-SMT system. Integration of paraphrases into PB-SMT system results in significant improvement. © 2014 Springer-Verlag Berlin Heidelberg.
CITATION STYLE
Pal, S., Lohar, P., & Naskar, S. K. (2014). Role of paraphrases in PB-SMT. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8404 LNCS, pp. 242–253). Springer Verlag. https://doi.org/10.1007/978-3-642-54903-8_21
Mendeley helps you to discover research relevant for your work.