Role of paraphrases in PB-SMT

Santanu Pal; Pintu Lohar; Sudip Kumar Naskar

Conference Proceedings

Role of paraphrases in PB-SMT

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2014) 8404 LNCS(PART 2) 242-253

DOI: 10.1007/978-3-642-54903-8_21

2Citations

3Readers

Get full text

Abstract

Statistical Machine Translation (SMT) delivers a convenient format for representing how translation process is modeled. The translations of words or phrases are generally computed based on their occurrence in some bilingual training corpus. However, SMT still suffers for out of vocabulary (OOV) words and less frequent words especially when only limited training data are available or training and test data are in different domains. In this paper, we propose a convenient way to handle OOV and rare words using paraphrasing technique. Initially we extract paraphrases from bilingual training corpus with the help of comparable corpora. The extracted paraphrases are analyzed by conditionally checking the association of their monolingual distribution. Bilingual aligned paraphrases are incorporated as additional training data into the PB-SMT system. Integration of paraphrases into PB-SMT system results in significant improvement. © 2014 Springer-Verlag Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Pal, S., Lohar, P., & Naskar, S. K. (2014). Role of paraphrases in PB-SMT. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8404 LNCS, pp. 242–253). Springer Verlag. https://doi.org/10.1007/978-3-642-54903-8_21

Role of paraphrases in PB-SMT

Abstract

Author supplied keywords

Cite

Register to see more suggestions